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Circuit Stochastics and Behavior 

PAGE 215 

Although animal behavioral responses to a stimulus are predictable on average, 
individual responses are highly variable. Gordus et al. study neuronal activity in 
response to an attractive odor in C. elegans and find that the stochastic relative 
activity states of interneurons downstream of sensory neurons determine the 
probability of the odor causing a motor output at a given time. 

Cancer in a Clamshell 

PAGE 255 

Massive loss of marine bivalve animals is happening throughout the world. 
Metzger et al. report that this is caused by a fatal form of cancer spreading 
through horizontal clonal transmission of tumor cells that likely arose from a 
single clam. 



A Gut Sense about Serotonin 

PAGE 264 

The gastrointestinal tract contains much of the body’s serotonin, yet the control of its synthesis and breakdown is poorly 
understood. Yano et al. reveal that metabolites generated by spore-forming microbes in the gut microbiome promote host 
serotonin biosynthesis. Disrupting this communication impacts intestinal motility and hemostasis. 

Aii Together Now! 

PAGE 277 

Coordinated organ behavior is crucial for an effective response to environmental stimuli. Chen et al. demonstrate collective 
responses of hair follicles that allow an all-or-none decision to regenerate after plucking through a combination of molecular 
cues and immune cell recruitment. 



Unmasking Long Noncoding RNA-Protein Complexes 

PAGE 404 

The long noncoding RNAXist is the master regulator of mammalian dosage compensation. Chu et al. develop a method to 
identify the composition of long noncoding RNA-protein complexes in vivo, providing insights into its temporal assembly 
and domain architecture. 



All about That Shape 



PAGE 307 

Although protein-DNA binding specificity is mediated by hydrogen bonds and hydrophobic contacts, these modes of recog- 
nition are not sufficient to explain specificity, particularly for factors with highly similar DNA binding domains. Abe et al. show 
that discrete DNA structural features play an independent and direct role in binding specificity by Hox proteins and that knowl- 
edge facilitates the de novo prediction of DNA binding specificities. 



A Dirty Sponge 

PAGE 319 

Karreth et al. report that the BRAF pseudogene BRAFP1 regulates the activity of 
its parental gene by competing for miRNA binding. Transgenic expression 
induces aggressive lymphoma in mice, and copy number analyses of human 
cancers further suggest an oncogenic function for the pseudogene. 

Neuronal LINE Up 

PAGE 228 

Somatic genome mosaicism among neurons has the potential to impact brain 
function. Upton et al. show that LINE-1 retrotransposons mobilize extensively 
in hippocampal neurons, preferentially in hippocampally expressed loci. They 
are depleted from mature neurons when oriented in the most deleterious 
configuration to host genes, suggesting functional significance. 
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Actin Fist Bump 

PAGE 361 

Cellular movement during development often originates when two cells collide 
and then repel each other, a process known as contact inhibition. Although it 
has been thought to be an uncoordinated reaction to contact, Davis et al. 
now show that it requires physical coupling of the flowing actin networks of 
the contacting cells. Actin acts as a mechanotransducer, allowing the cells to 
sense each other and coordinate their behaviors. 



A Little Trim Keeps Growth under Controi 

PAGE 333 

Whereas ubiquitination targets many proteins for complete proteosomal degra- 
dation, modification of NF-kB p105 leads to selective proteolysis to generate 
p50. Kravtsova- 1 vantsiv et al. identify the responsible E3 ligase and demon- 
strate that appropriate production of p50 is crucial for proper NF-kB signaling, 
as lowered levels of the ligase and p50 contribute to loss of growth control in 
tumorigenesis. 



Actin’ Like a Pathogen 

PAGE 348 

Intracellular bacteria co-opt the host cell cytoskeleton to move in the cytoplasm and to disseminate. Benanti et al. find that 
pathogenic and non-pathogenic Burkholderia species use different strategies to drive actin-based motility. Pathogenic bac- 
teria express a protein that mimics host EnaA/ASP actin polymerases, which allows them organize actin filaments in a way 
that propels their movement and facilitates cell fusion during infection. 



Unified Theory of Migration 

PAGE 374 

Different cell types have intrinsically distinct migration patterns and responses to their environment. Maiuri et al. discover a 
simple rule that unifies the motile behavior of all cells: the straightness of movement or persistence is always an exponential 
function of their speed. Using this principle, they construct a physical model that predicts cell trajectories and reveals new 
insights about the molecular control of cell motility. 

Immunity and Genetics InterTWINed 

PAGE 387 

To understand how genetic variations may play role in the control of the homeostasis and disturbances of the immune system, 
Roederer et al. analyze 78,000 immune traits in 700 female twins and relate heritable traits with almost 300 single-nucleotide 
polymorphisms. The data link genetic control elements associated with normal immune traits to common autoimmune and 
infectious diseases, providing a shortcut to identifying potential mechanisms of immune-related diseases. 



Modeling Cancer with Stem Cells 

PAGE 240 

Lee et al. establish a model of human familial cancer by deriving induced plurip- 
otent stem cells from Li-Fraumeni syndrome patients. By modeling cancer with 
stem cells, they implicate the imprinted gene network in osteoblast differentia- 
tion defects and osteosarcoma development. 



Fumbling the Cholesteroi Hand Off 

PAGE 291 

How is cholesterol transported within cells? Chu et al. find that lysosomal 
LDL-cholesterol enhances contacts between lysosomes and peroxisomes to 
facilitate cholesterol transport and that cells from patients with peroxisomal 
disorders exhibit massive cholesterol accumulation. 
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A Brief History of ALS 



In a particularly poignant scene in the movie The Theory of 
Everything, a 21 -year-old Stephen Hawking, full of excite- 
ment toward unlocking the secrets of the universe, learns 
of his debilitating diagnosis of amyotrophic lateral sclerosis 
(ALS). The year is 1963. He asks his doctor whether his brain 
will still be able to function in a landscape where the neurons 
controlling every aspect of his movement are rapidly dying. 
“Yes” says the doctor, unsure of what respite this piece of in- 
formation could possibly give a young man facing a tragically 
short life ahead of him. Cut chase to 2015, and we are 
perhaps no closer to identifying the definitive cure for this 
particularly brutal but most common form of motor neuron 
disease. Only 10% of ALS cases are familial, and we do not 
yet know what the cause or causes can be for the majority 
of others. After diagnosis, patients typically do not survive 
any longer than 2-5 years. In this era of sequencing and big 
data, what we do have, however, is an increasingly useful 
treasure trove of information that can be mined to dissect 
out the complexity of this disease. 




Charting neuronal function and connections over space and time. 
Image from iStock.com/alexovicsattila. 



Mutations in S0D1 that lead to misfolding and intracel- 
lular aggregation of the enzyme are associated with 20% 
of familial cases of ALS (Robberecht and Philips, 2013). 
Although the best studied, the wealth of data on SOD1 
has not brought the field much closer toward a viable 
therapeutic solution, largely due to the differences be- 
tween disease progression in humans and animal models. 
In fact, there is a clear dearth of excellent animal models 
that are not based on overexpression of mutant proteins, 
and while iPSC-based methods have been helpful to 
model human disease, the need to study disease onset 
and progression in vivo has never been greater. Over 
the two decades, several other mutant genes have been 
linked to ALS, including those like SOD1 that aggregate 
and/or compromise cellular proteostasis, such as FUS, 



ubiquilin2, and sequestosome (Robberecht and Philips, 
2013). 

Exome-sequencing studies, in particular, have proven to be 
invaluable in the identification of hereditary as well as de novo 
mutations associated with neurodevelopmental and neuro- 
degenerative disorders. From the angle of familial ALS, Wu 
et al. identified profilin 1 in two independent familial cases 
of ALS (Wu et al., 2012) while Smith et al. sequenced 363 pa- 
tient exomes to provide strong evidence for tubulin alpha 4a 
gene mutations (Smith et al., 2014). Both of these papers 
reveal a role for cytoskeletal regulators in driving disease. 
Aberrant RNA processing and the association of altered 
ribonucleoprotein homeostasis is another major emerging 
concept in ALS research. Mutations in the gene encoding 
the transcriptional and splicing regulator TDP-43 result in 
neuronal inclusions now regarded as a hallmark of the 
disease (Robberecht and Philips, 2013). Whether TDP-43 
inclusions are causative or simply a characteristic of disease 
is not yet clear. In this regard, Johnson et al. identified 
mutations in the DMA and RNA-binding protein and TDP-43 
interactor, MATR3, in several cases of familial ALS (Johnson 
et al.,2014). 

Given, however, that the vast majority of d iagnoses are of the 
sporadic form, such sequencing efforts should unravel equally, 
if not more valuable, data about what, if not possibly why, 
de novo mutations occur. Chesi and colleagues sequenced 
47 ALS patients as well as their unaffected parents to identify 
a surprising enrichment for chromatin regulators, including 
CREST (Chesi et al., 2013). Most recently, Cirulli and group 
performed one of the largest ever ALS exome-sequencing 
studies by comparing 2,874 patients with over 6,000 
controls (Cirulli et al., 201 5). They identify with high confidence 
mutations in TANK-binding kinase 1 (TBK1), a kinase that con- 
trols key proteins in the autophagy pathway such as opto- 
neurin and p62, which themselves have been linked to ALS. 
Besides the models of altered proteostasis and ribostasis, 
Cirulli et al. propose that perhaps dysregulation of autophagy 
could be a central driving factor for disease progression. 

As “mineable” sequencing data can be, they often raise 
many more questions than answers. The study from Cirulli 
et al. does not share significant overlap of top hits with other 
major sequencing efforts (Cirulli et al., 2015). What could be 
some of the reasons for these differences? Are familial and 
sporadic cases of ALS so very different? Is ALS a single dis- 
ease or a spectrum of related diseases? Do patients need to 
be stratified based upon the severity of disease, age of diag- 
nosis, sex, or any other parameters before such sequencing 
efforts are undertaken? Or do these individual studies put 
forth a broader conceptual point, that perhaps what really 
matters in understanding why motor neurons die is the spe- 
cific pathway or the key biological process affected and not 
the specific genes? Researchers and clinicians are actively 
thinking about these very questions, and when one looks 
back at these sequencing studies in light of other ALS litera- 
ture, an even more pertinent question comes to mind: how 
does one define ALS in a molecular context? Is it a disease 
of altered proteostasis or cellular quality control or is it really 
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more to do with RNA processing misregulation? Does it 
manifest because signaling events mediating interactions 
between key organelles at the heart of these biological pro- 
cesses, the ER, Golgi, and mitochondria are compromised? 
Or should one zoom out entirely at look at this at the cellular 
level to then look at the disease in the context of altered 
axonal function, neurite growth, or defects in excitation 
(Roselli and Caroni, 2015)? Indeed, a recent study has found 
that patient-derived motor neurons demonstrate increased 
membrane excitability irrespective of the mutation type 
(Wainger et al., 2014). 

Collectively, these questions may seem daunting, even 
overwhelming, but in a way, they are also incredibly exciting 
in terms of developing several new frontiers for the explora- 
tion of ALS. Systems biology approaches in congruence 
with clinical and molecular studies will hopefully unravel 
and pair down key nodes that are most important for under- 
standing as well as targeting the disease. Given that such ap- 
proaches are being used to study other neurodegenerative 
diseases, it will also be interesting to study if there are shared 
commonalities between types of neurodegenerative disor- 
ders, such as the repeat expansions seen in the gene 
C9orf72 in ALS and frontotemporal dementia as well as 
Ataxin2 in spinocerebellar ataxia and ALS (Robberecht and 
Philips, 2013). 

An enigmatic question that has plagued not just ALS but 
several other neurodegenerative disorders is the issue of 
selectivity. Assuming every cell in the body harbors the dis- 
ease-causing or promoting mutation, why are motor neurons 
alone so vulnerable while other cell types, even neuronal 
classes spared? The concept of local insults adding fuel to 
a cell type already rather vulnerable, for reasons that are still 
not as clear, has been proposed to explain selective neuronal 
death (Roselli and Caroni, 2015). Perhaps localized neuroin- 
flammation or damage to the local vasculature creates an 
environment where supporting cells fuel instead of healing 
damaged motor neurons. Cirulli et al. note that TBK1 also 
regulates the pro-inflammatory NF-kB pathway and that neu- 
roinflammation may be an important mechanism to explore in 
the progression of disease (Cirulli et al., 2015). 

A comprehensive picture of the disease is clearly impera- 
tive for rational drug design. Currently, riluzole is currently 
the main approved drug for the treatment of ALS, but its ben- 
efits are minimal. A number of compounds that showed 
promise in animal models failed clinical trials. A better under- 
standing of what to target, as well as taking into account 
patient to patient variation, will be central to drug discovery 
and development. Having said this, for a relatively rare dis- 
ease, ALS is thankfully not unheard of. The ALS Bucket Chal- 
lenge, a social-media-fuelled phenomenon this past year 
alone helped to raise both awareness as well as over $100 
million toward the ALS Foundation. Patient care and man- 
agement are additional challenges and will continue to be, 
especially if therapeutic interventions help to extend overall 
lifespan but without sufficiently retaining motor skills. Yet 
the right medical, technological, and social support can go 
a long way in preserving cognitive capacity as well as the 
overall quality of life of ALS patients, as exemplified by 
none other than Hawking himself, now 73, still solving mys- 
teries of the universe. 
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Immunotherapy: The Path to Win the War 
on Cancer? 



On Breakthroughs and Evolution 




Suzanne L. Topalian 

Johns Hopkins Kimmel Cancer Center 

Drugs targeting immune checkpoint mole- 
cules such as CTLA-4, PD-1 , and PD-L1 
are being heralded as a breakthrough 
in oncology. However, “breakthrough” is 
a misnomer belying several decades of 
basic immunology advances and clinical 
trial and error leading up to this point. It 
was only after basic science uncovered 
the protean pathways restraining anti- 
tumor immunity that we could begin 
to unravel how to subvert them. The 
broad activity spectrum of PD-1 pathway 
blockers against multiple cancer types 
has validated this as a “common denomi- 
nator” treatment approach and dispels the 
perception that “immunogenic” cancers 
are anomalies. We are now challenged to 
understand immune resistance mecha- 
nisms utilized by “non-responsive” tumor 
types (e.g., prostate cancer) as well as 
the 50% or more of individuals with 
“responsive” tumor types who do not 
benefit from these drugs. Identifying 
collateral pathways for co-targeting in 
combination treatment regimens requires 
an intellectual leap to consider unex- 
pected intersections between the immune 
system and genetics, epigenetics, and 
metabolism. For instance, tumor muta- 
tional density, a surrogate indicator of neo- 
antigens available for immune recognition, 
correlates with the responsiveness of mel- 
anoma to anti-CTLA-4, and lung cancer to 
anti-PD-1 . In the final analysis, teamwork 
with cross-fertilization of ideas across 
different scientific disciplines has driven 
the evolution to today’s “breakthroughs” 
and will meet tomorrow’s challenges. 



Central Dogma for Immunotherapy 




Jedd D. Wolchok and Timothy A. Chan 

Memorial Sloan Kettering Cancer Center 

In biology classes, we learned about the 
central dogma of molecular biology— 
DMA makes RNA and RNA makes protein. 
We’ve also learned about factors that 
regulate this central process, such as the 
influence of epigenetics, micro-RNAs, 
and mechanisms regulating post-tran- 
scriptional and translational control. 
Despite the fine intricacies, the central 
dogma of molecular biology remains 
intact— inherently elegant and graspable. 
A unifying concept for cancer immu- 
nology, on the other hand, has remained 
elusive until recently. We now have discov- 
ered the existence of molecular mecha- 
nisms of immune surveillance (thanks to 
Bob Schreiber) and that the number and 
quality of immune cells within the tumor 
microenvironment has significant prog- 
nostic impact in a variety of cancers. The 
quantity and quality of so-called “passen- 
ger” mutations in the tumor are also very 
important in determining the likelihood 
of success of immunologic checkpoint 
blockade with CTI_A-4 or PD-1 pathway 
blocking antibodies. A putative dogma 
therefore is that mutations drive baseline 
immune reactivity and baseline immune 
reactivity is what determines the potential 
for benefit of immune potentiating thera- 
pies. As in molecular biology, there are 
likely to be modifiers, such as inhibitory 
cells populations, hostile microenviron- 
ments, and loss of antigen presenting 
capacity. Yet, a unifying concept will 
likely allow the field to further refine its 
approaches and specifically address the 
immunologic needs of individual patients. 



Not Just Another Hallmark 




Ira Mellman 

Genentech 



After an incubation period of nearly 
100 years, cancer immunotherapy has 
emerged as a transformative approach 
to treat a wide variety of cancers. 
Although still early days, immunotherapy 
provides a degree of sustained clinical 
benefit rarely observed with more tradi- 
tional cancer treatments. The excitement 
is, therefore, being largely driven by 
clinical results rather than by “break- 
throughs” in the laboratory. There are 
nevertheless two daunting challenges. 
First, the field has progressed so rapidly 
in the clinic that our understanding of 
the underlying basic science and mecha- 
nisms of action are remarkably thin. 
Second, the tools we have to assess 
mechanism and correlates of treatment 
response (or lack thereof) remain rudi- 
mentary. Meeting these challenges is crit- 
ical, since only a minority of patients 
as yet exhibit maximal benefit from immu- 
notherapy. Importantly, clinical responses 
to agents such as anti-PD-L1/PD-1 are 
often clear and dramatic, thereby creating 
the opportunity to discover biomarkers 
and use them to understand inevitable 
patient to patient variations. Exploiting 
these correlates of clinical response will 
provide insights into basic cancer biology 
and inform immunotherapy combinations 
that can be expected to result in higher 
response rates and disease cures. Our 
task will be to backfill the science behind 
an exciting and validated therapeutic 
approach, ensuring that the field can 
look forward to a very exciting next 
decade both in the lab and in the clinic. 
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New Trends in Cancer Vaccines 




Karolina Palucka and Jacques Banchereau 

The Jackson Laboratory for Genomic Medicine 

Clinical responses to checkpoint 
blockade are linked to the presence of 
T cell immunity to cancer-specific muta- 
tions. One way to increase the rate of 
clinical responses is to use vaccination 
to expand T cells specific for cancer 
mutations. Several phase III clinical trials 
testing different cancer vaccine candi- 
dates are currently ongoing. Exogenous 
vaccines utilize, for instance, dendritic 
cell-based and viral vectors-based ap- 
proaches to boost the immune response 
in cancer patients. To be successful, 
these platforms will require applying 
high-throughput genomics to identify 
cancer-specific mutations and candidate 
peptide antigens in each patient in 
order to produce personalized vaccines. 
An alternative approach, endogenous 
vaccination, is based on exploiting the 
local release of antigens that happens 
upon standard cancer therapy (chemo- 
therapy or radiotherapy) or oncolytic viral 
therapy. However, this strategy requires 
endogenous antigen presentation to be 
effective, in order to generate therapeutic 
T cell immunity. Dendritic cells are often 
skewed by tumors to generate pro-tumor 
immunity and thus reprogramming of 
their function in vivo is critical for the 
success of endogenous vaccination. 
Increasing the understanding of cancer 
genomics, the biology of antigen pre- 
sentation and T cell biology will enable 
development of next-generation cancer 
vaccines which, combined with check- 
point blockade inhibitors, will pave the 
path to curative therapies for patients 
with cancer. 



Personalized Immunotherapy 




Steven A. Rosenberg 

National Cancer Institute, NIH 



Adoptive cell transfer (ACT) uses patient’s 
lymphocytes to treat their autologous 
cancer. When tumor infiltrating lympho- 
cytes are used for ACT, they can mediate 
complete, likely curative, regression 
in patients with metastatic melanoma. 
Lymphocytes genetically engineered to 
express anti-tumor receptors can treat 
patients with refractory lymphomas and 
leukemias. However, the majority of 
metastatic epithelial cancers are still 
resistant to immunotherapy. Recent ap- 
proaches using deep exome sequencing 
along with high-throughput immunologic 
testing opened the door to treat these 
common types of cancer and to identify 
the rare somatic mutations that are immu- 
nogenic. ACT targeting these mutations 
is the ultimate “personalized” cancer 
treatment but is contrary to the mantra 
of many pharmaceutical companies who 
want “drugs in a vial” that can be mass 
produced and distributed. Although this 
approach has produced drugs that pro- 
long the life of patients with solid meta- 
static cancers, curative treatments are 
rare and have had limited impact on 
overall death rates from cancer. A highly 
“personalized” immunotherapy for com- 
mon cancers may require the develop- 
ment of a unique drug (autologous lym- 
phocytes) for each patient. It will also 
need major changes and considerable 
flexibility on the part of industry. The 
effectiveness of treatment should trump 
simplicity of production and convenience 
of administration if progress is to be 
made in enabling patients with metastatic 
cancer to be cured and relish a normal 
lifespan. 



The New Immune Engineers 




K. Dane Wittrup 

Koch Institute for Integrative Cancer Research, MIT 

What vaccine best exploits the evolu- 
tionary weaknesses of a virus or a tumor’s 
mutations? What is the intratumoral 
exposure history of intravenously injected 
antibodies? How does deregulated 
signaling tip the balance from healthy 
homeostasis to autoimmune disease? 
Can native T cell tropism overcome 
physical barriers to macromolecular drug 
delivery? How does our antibody reper- 
toire respond to therapy or disease? 
How might innate and adaptive immuno- 
therapies best be combined? How does 
lymphatic transport actively regulate 
adaptive immunity? Can injectable bio- 
materials program an effective anti-tumor 
immune response? A common thread 
through these varied and timely topics is 
the engagement of biological, chemical, 
and materials engineers at the forefront. 
At their disposal is an analytical toolkit 
honed to solve problems in the petro- 
chemical and materials industries, which 
share the presence of complex reaction 
networks and convective and diffusive 
molecular transport. Powerful synthetic 
capabilities have also been crafted: 
binding proteins can be engineered 
with effectively arbitrary specificity and 
affinity, and multifunctional nanoparticles 
and gels have been designed to interact 
in highly specific fashions with cells and 
tissues. Fearless pursuit of knowledge 
and solutions across disciplinary bound- 
aries characterizes this nascent discipline 
of immune engineering, synergizing with 
immunologists and clinicians to put 
immunotherapy into practice. 
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Ronald J. Konopka (1947-2015) 



Ron Konopka was founid (dead of an 
apparent heart attack in his Pasadena, 
CA home on February 1 4, 201 5. Konopka 
was my close contemporary and began 
graduate school at Caltech in 1967. 
He published his thesis work along with 
his mentor Seymour Benzer in what is 
perhaps the single most influential paper 
in circadian rhythms (Konopka and 
Benzer, PNAS 68, 2112-2116). The field 
has spent much of the subsequent 45 
years deciphering the meaning and vali- 
dating (over and over again) the impor- 
tance of this Rosetta stone. It began the 
modern era of circadian biology and is 
the cornerstone of my own circadian 
career. As if this were not enough, it is 
arguably the landmark paper in behavioral 
genetics writ large. 

Benzer moved from Purdue to Caltech 
in the mid-60s and began this field; the 
physical move paralleled an intellectual 
move from prokaryotic genes to the un- 
derpinnings of behavior. He is properly 
credited with combining simple behav- 
ioral screens with the power of Drosophila 
genetics. The strategy could associate 
single mutations and the underlying 
genes with a behavioral phenotype. 
Although Benzer accumulated a coterie 
of talented students and post-docs to 
join him in this grand adventure, Konopka 
was the first. Moreover, he brought 
the circadian problem to Benzer rather 
than vice versa, and Ron designed as 
well as carried out the primary screen 
used to search for circadian mutants. 
The clock causes adult flies to eclose 
(emerge from the pupal case) at or shortly 
after dawn; this rhythmic emergence con- 
tinues in constant darkness, with about 
24 hr periodicity. The screen therefore 
searched for mutant flies that eclose in 
aberrant fashion and was remarkably 
successful. Ron found a short period 
mutant (about 20 hr), a long period mutant 
(about 30 hr) and an arrhythmic mutant. 

Three striking features of the 1971 
Konopka and Benzer paper led them to 
propose that the mutants were central 
to circadian rhythms. First, the three 
mutants affected not only the eclosion 
rhythm but also an independent circadian 
rhythm feature, locomotor activity, which 
also exhibited a short period, a long 



period, or arrhythmicity. Second, genetic 
analysis indicated that all three mutations 
were alleles of a single gene, which they 
named period. The more expected result 
would have been three different genes 
each giving rise to the very different 
circadian phenotypes of fast, slow, or no 
rhythm; the finding of a single gene 
suggested that only a small number of 
gene products might be running the circa- 
dian clock. Third and most intriguingly, 
the results indicated that this single pro- 
tein was of key importance for circadian 
timing, as it could mutate to a fast-running 
protein (short period) or a slow-running 
protein (long period) as well as being 
necessary for rhythmicity. 

It took another 15 years for recombi- 
nant DMA and DMA sequencing to allow 
molecular characterization of the period 
gene and its protein, which verified 
some of these much earlier implications. 
For example, the short and long period 
alleles were determined to be missense 
mutations that altered the protein, 
whereas the arrhythmic mutation was a 
stop codon that prevented synthesis of 
the protein. Subsequent dynamic assays 




Ronald J. Konopka 



from many labs continue to this day and 
show that the short and long period alleles 
really do speed up and slow down the 
clock pace in ways that are being under- 
stood in considerable mechanistic detail. 
The period protein is also conserved in 
mammals. Although there are certainly 
some functional differences between the 
mammalian period proteins and the fly 
protein, one cannot overstate the extent 
to which the conclusions from Konopka 
and Benzer (1971)— drawn strictly from 
phenotypic and genetic studies— were 
prescient for the entire circadian field 
and all of its subsequent molecular so- 
phistication. 

Konopka did a post-doc at Stanford 
with the circadian biology pioneer Colin 
Pittendrigh and then was hired back at 
Caltech as an Assistant Professor in 1 974. 

Although publication requirements were 
much less onerous 40 years ago than 
today, Konopka was denied tenure based 
on his thin publication record from those 
assistant professor years. Nonetheless, 
important work from his lab was published 
at the end of his Caltech stay. Although 
these papers substantially added to the 
characterization of the period gene and 
its importance to circadian biology, they 
were deemed too late or insufficient to 
impact the tenure decision. 

Konopka moved to Clarkson University 
in the early ’80s. He had maintained a 
warm relationship with my long-time 
Brandeis collaborator Jeff Hall since their 
Caltech days and was important to our 
initial efforts to clone and identify the 
period gene. We were amateurs in the as- 
says of locomotor rhythms, and Ron 
made sure that our first transgenic flies 
with wild-type period DNA constructs 
were indeed rhythmic. So we had truly 
rescued the arrhythmic behavior of the 
mutant host strain and had the gene in 
hand. Konopka continued to publish and 
was on track to receive tenure at Clark- 
son, but his promotion was apparently 
derailed by changed academic priorities 
at the university. He returned in 1990 to 
the small Pasadena house he had pur- 
chased while at Caltech. 

Although Ron spent his last 25 years 
out of academic science, he began tutor- 
ing high school students in math and sci- 
ence after his return to Pasadena. Ac- 
cording to his friend and former Benzer 
post-doc Larry Kauvar, “he was genuinely 
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fascinated by what makes science hard 
for some people and easy for others.” 
This long-standing commitment to teach- 
ing, along with a sardonic wit and broad 
interests, also contributed to his popu- 
larity as a Caltech professor, including 
by non-biologists. His hobbies included 
a first-rate butterfly collection as well as 
perhaps a thousand Grateful Dead con- 
cert tapes. 

Few people know that Ron also played 
a seminal role in the beginnings of the He- 
reditary Disease Foundation. Milton Wex- 
ler, a psychoanalyst in Los Angeles, had 
begun to search for ways to attack Hun- 
tington’s disease (HD), an illness that 
affected his wife’s family. Wexler con- 
sulted with Benzer, who proposed in 
1971 that Wexler hire his then 23-year- 
old graduate student Konopka. His task 
was to seek out talented people to attend 
a workshop and potentially pursue 



research on HD. Konopka was so suc- 
cessful that Wexler hired him as the first 
Scientific Director of the organization 
that eventually became the Hereditary 
Disease Foundation. According to Alice 
Wexler, “Ron filled this post with his char- 
acteristic imagination and intelligence for 
several years. He played a wonderfully 
creative role in the history of the Heredi- 
tary Disease Foundation, and his legacy 
lives on to this day.” 

Although Konopka participated only 
marginally in the molecular revolution 
that overtook behavioral genetics and fu- 
eled the remarkable progress of the circa- 
dian field since the mid-80s, his initial 
work was essential. The same is true for 
precious few researchers. Indeed, most 
scientists would fail the “deletion-test,” 
a term coined by Gerry Rubin to describe 
a scientist’s contributions by imagining 
what the field would be like had he/she 



not existed. The same cannot be said of 
Konopka and his bold, revolutionary 
screen. That paper proved a very hard 
act to follow. 

Sydney Brenner, and apparently J.D. 
Bernal before him, compared science to 
chess. They emphasized that the two 
games most worth playing are the open- 
ing game and the end game. Konopka 
and Benzer played the ultimate opening 
game. As Benzer died in 2007, Ron 
Konopka’s death closes this remarkable 
and singular chapter in the history of 
circadian rhythms, sadly the end of the 
beginning. 
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While some behavioral responses to a stimulus are invariant in animals, most are more likely to be 
variable or stochastic. In this issue, Gordus et al. illuminate a set of combinatorial neuronal activities 
that control the variability of a chemotactic behavior in response to an odor, providing a tractable 
system for understanding how stochastic circuit dynamics affect behavior. 



I was walking in a nearby desert some 
years ago on a beautiful Spring day, 
with wildflowers blooming all around 
and a breeze wafting from across a 
nearby grapefruit orchard in full bloom. 
The scent was transcendent, but every 
now and then, the breeze would slow 
down or change direction and the 
wonderful aroma would diminish or 
disappear. I found myself searching for 
the smell and soon realized that subcon- 
sciously I was employing an optimal strat- 
egy for finding patchy, unpredictably 
distributed targets. The strategy involves 
random back-and-forth searches and 
has been documented for predator- 
prey, pollinator-flower, and mating part- 
ner searches by such animals as rein- 
deer, jackals, honey bees, seals, spider 
monkeys, microzooplankton, and Peru- 
vian fishermen (Bertrand et al., 2007). 
Neuroscientists have noticed that 
learning often shows randomness both 
in the behavior and in electrophysiolog- 
ical recordings (Turner and Brainard, 
2007), but its source has been unknown. 
Until now. The research group headed 
by Cori Bargmann at the Rockefeller 
University (Gordus et al., 2015) has 
used the odor-searching behavior of the 
nematode worm Caenorhabditis elegans 
to track down the site and mechanism 
of this kind of behavioral variability. 

Similar to my grapefruit odor seeking 
behavior, C. elegans (using only a few 
of its neurons) pursues attractive odors 
by moving forward as long as the inten- 
sity of the pleasant odor remains the 
same or increases but then changes or 
reverses directions if the gradient de- 
creases (Pierce-Shimomura et al., 1999). 
The Bargmann lab folks studied the 
neurons that produce a reversal motion. 
The network that they studied consists 



of just four pairs of neurons: chemosen- 
sory neurons (called AWC); motor output, 
or “reversal command” neurons (AVA); 
and two kinds of interneurons (AIB and 
RIM) in between (Figure 1). 

This circuit suggests a mostly feedfor- 
ward activation of the reversal motor 
neurons, and indeed, all eight neurons 
are activated whenever a reversal 
occurs. In addition, activating any one of 
the neurons optogenetically causes the 
number of reversals to increase. This 
result suggested to the Bargmann group 
that this circuit has a built-in variability 
generator, triggering random reversals of 
forward undulations. 

To ensure that any variability in 
neuronal activity patterns was not due 
to diminished sensory perception but, 
rather, to circuit dynamics, the authors 
applied saturating concentrations of 
the odorant, which decreases the rate of 
reversals. Remarkably, the response to 
the odor did not remain constant for the 
whole duration of stimulus presentation 
(30-60 s) but instead flickered, jumping 
back and forth between no response 
(“off” state) and full response (“on” 
state). Even more remarkably, the whole 
circuit often flickered off and on at the 
same times. This flickering was corre- 
lated with the variability of the network’s 
output— the activation of the reversal 
command neurons. This correlation 
motivated the authors to determine 
which neurons were responsible for the 
variability. 

Using a variety of techniques, the inter- 
neurons (AIB and RIM) and the reversal 
command neurons (AVA) were silenced, 
either individually (e.g., the pair of RIM 
interneurons) or in pairs (e.g., both AIB 
and RIM pairs). Strikingly, removing either 
type of interneuron made the behavior 



more reliable: the response to the 
attractive odor more reliably inhibited the 
command neurons, and ablating both 
pairs of interneurons made the command 
neurons’ responses to the odor extremely 
reliable. These experiments were done on 
restrained worms using Ca^'^ imaging to 
monitor neuronal activity, but the effect 
of eliminating one of the interneuron 
pairs— RIM— was confirmed to make the 
response to the odor more reliable in 
freely moving worms. 

Although eliminating each of the inter- 
neurons has similar effects on the net- 
work’s output, their functions are not 
redundant. For instance, silencing the first 
interneuron in the chain (AIB) stabilizes 
the “off” state in the rest of the network, 
whereas silencing the second pair of in- 
terneurons (RIM) decreased the correla- 
tion of the flickers between the remaining 
interneurons (AIB) and the reversal com- 
mand neurons (AVA). These results would 
not be predicted by the feedforward con- 
nections (AWC => AIB => RIM => AVA). 
Instead, these (and other) findings 
strongly suggest that the variability de- 
pends upon the feedback connections 
(from RIM to AIB and from AVA to AIB). 
Not surprisingly, eliminating chemical 
synaptic transmission in either pair of 
interneurons had the same effect on 
reducing variability as did silencing these 
neurons, indicating that it was the chemi- 
cal synapses, not the electrical ones, that 
are responsible for the variability in the 
response. 

Exactly how this network produces 
the variability is not clear. In part, that’s 
because the valence (excitatory or inhibi- 
tory) of the chemical synaptic connec- 
tions is not entirely clear, especially 
in the feedback connections. For 
instance, the interneuron RIM releases 
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Figure 1. A Variability Generator Built into a Neuronal Circuit 

Four types of neurons (each type is paired) that influence reflexive reversal of 
locomotion, as well as their interconnections, are depicted. The names of the 
cells are based upon the location of their cell bodies in the nervous system, 
and the synaptic connections were determined from serial EM studies (White 
et al., 1 986). The chemical synaptic contacts are likely to be mostly excitatory, 
acting on the neuron contacted by the filled circle. The electrical synapses 
allow electrical currents to pass both ways between neurons. The dashed 
connection was determined by experiments in this study, which may be in- 
direct, through neurons not shown in this figure. The upward arrows from the 
AWC chemoreceptors indicate that removing an odor is the effective stimulus 
for eliciting reversals. The two states of the system shown are two of the most 
common three states of the circuit found immediately after odor is removed, 
thereby activating the chemosensory neuron (AWC). Color in a neuron means 
that it is in an activated state. In the “AIB-only” activated state, the odor 
essentially always elicits a behavioral reversal; hence, this state is reliable. 
When both pairs of interneurons (AIB and RIM) are active, the response 
becomes variable. (The third state— with AIB, RIM, and AVA all off— is also 
reliable.) These results indicate that the interneurons are the source of the 
variability in the response. 



three different neurotransmit- 
ters (glutamate, acetylcho- 
line, and tyramine), and the 
interneuron AIB releases only 
glutamate, but RIM (its pri- 
mary target) expresses both 
excitatory and inhibitory re- 
ceptors to this neurotrans- 
mitter. Doing electrophysi- 
ology in C. elegans is 
devilishly difficult, so working 
out the valence, strength, 
and temporal properties 
in these synaptic connec- 
tions— as well as the inherent 
membrane properties— un- 
derlying the flickering awaits 
future studies. 

It is interesting to consider 
why evolution might have 
inserted two layers of neurons 
in a circuit just to make 
that circuit’s function less 
reliable. It could be entirely 
to implement the afore- 
mentioned optimal search 
pattern for food (Bartumeus 
et al., 2002), but it may have 
other functions too, such as 
providing a substrate for 
behavioral plasticity (Turner 
and Brainard, 2007). In addi- 
tion, the circuit shown in 
Figure 1 is embedded in 
more complex circuits in the 
worm’s nervous system, 
such that the interneurons 
may be active in other behav- 
iors. These neurons could 
act as traffic police, pointing 
neuronal activity toward dif- 
ferent commanded behav- 
iors. In line with this notion is 
the logical algorithm suggested by Gor- 
dus et al. for triggering reversals: the 
output state of the system (i.e., whether 
the reversal command neurons are acti- 
vated or inactivated) depends upon the 
state of co-activation of the interneurons 
and the command neurons. The authors 
conclude that a reasonable hypothesis 
for explaining their data is that the flick- 



ering activity states of the neuronal 
network act like attractors, pulling the 
network into a particular combination of 
activity state that initiates and maintains 
the reversal behavior. Interestingly, the 
importance of attractor states in behav- 
ioral choice in both invertebrates (Brigg- 
man and Kristan, 2008) and vertebrates 
(Churchland et al., 2012; Mante et al.. 



2013) has been recognized in 
recent years. Having this 
strategy present in such a 
simple nervous system as 
C. elegans is intriguing from 
an evolutionary point of view 
as well as for systems neuro- 
biology, as it provides a 
comprehensible circuit for 
testing ideas about how at- 
tractor systems are put 
together and how they affect 
behavioral outcomes. Indeed, 
many other questions about 
combinatorial circuit dy- 
namics and how and why 
they influence variable behav- 
ioral output in even broader 
contexts may now seem less 
daunting to tackle with the 
elegant system and concep- 
tual framework provided by 
Gordus et al. 
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An epidemic of leukemia among bivalve molluscs is spreading along the Atlantic coast of North 
America, with a serious population decline of soft-shelled clams. In this issue of Cell, Metzger 
et al. use forensic DNA markers to demonstrate that the leukemia cells have a clonal origin and 
appear to be transmitted through sea water. 



Those of US who have enjoyed clam bakes 
on the beach at Cold Spring Harbor Lab- 
oratory may soon find that this mode 
of networking among scientists has 
become a thing of the past. The soft- 
shelled clam (Mya arenaria) is suffering a 
population dive owing to the spread of a 
lethal leukemia-like cell. Similar leukemias 
have been observed in other species of 
edible bivalve molluscs such as mussels, 
cockles, and oysters that are farmed 
along the Atlantic littoral of North America. 
The disease in soft-shelled clams now 
ranges from Chesapeake Bay to the 
Canadian province of Prince Edward 
Island 1 ,500 km to the northeast. Importa- 
tion of clam stocks by shellfish farmers 
may have exacerbated its spread. 

Steve Goffs laboratory at Columbia 
University, working with ecologists at 
Environment Canada, had previously 
identified a retrotransposon in the clams 
called Steamer, which is greatly amplified 
in the leukemic cells (Arriagada et al., 
2014). Using genetic analysis of Steamer 
integration sites, mitochondrial single- 
nucleotide polymorphisms, and microsat- 
ellite variation, they now show that the 
leukemia has a monoclonal origin sharing 
common alleles that are different from 
their hosts (Metzger et al., 2015). In this 
respect, the leukemia in clams is similar 
to the Devil facial tumor disease (DFTD) 
of the marsupial carnivore, the Tasmanian 
devil, whose survival is endangered by 
rapidly spreading infection of a clonal 
neuro-endocrine cancer (Pearse and 
Swift, 2006, Murchison et al., 2012), and 
to canine transmissible venereal tumor 
(CTVT), which has a worldwide distribu- 
tion in dogs (Murgia et al., 2006; Murchi- 
son et al., 2014). 

DFTD is spread by biting, whereas 
CTVT, as its name implies, is sexually 



transmitted. The mode of transmission 
of the clam leukemia is not yet firmly 
established, but it is likely that these filter 
feeders take up tumor cells from the 
seawater through their siphons (Figure 1) 
and that the cells then parasitize the new 
host. When the leukemic clone of clams 
first emerged is unknown, but since the 
disease was noted in the late 1970s, it 
must be at last 40 years old. DFTD was 
first recorded in Tasmanian devils in 
1996 (Pearse and Swift 2006), whereas 
the venereal tumor in dogs is estimated 
to date from an ancient breed like the 
husky some 1 1 ,000 years ago (Murchison 
et al., 2014). Of course, even a 10,000 
year period is a snapshot on the evolu- 
tionary timescale of the hosts, and one 
can speculate how many other cases of 
tumor cells evolving into parasites may 
have occurred in species that may now 
be extinct. 

In addition to the three naturally occur- 
ring transmissible tumors of clams, dogs, 
and devils, there are several case his- 
tories of human malignancy arising from 
occult tumor cells in donor organ or tissue 
transplants that then emerge in the immu- 
nosuppressed transplant recipient (Mur- 
gia et al., 2006; Siddle and Kaufman, 
2015). There are also cases of transpla- 
cental tumor transmission from mother 
to child and between twins in utero. 
Among inbred strains of laboratory 
rodents, there would be an opportunity 
for tumor cells to spread from one individ- 
ual to another without crossing a major 
histocompatibility (MHC) barrier. How- 
ever, only one example has been docu- 
mented— that of leukemia in a colony of 
Syrian hamsters at NIH 50 years ago; the 
leukemic cells could even be transmitted 
by mosquitoes (Banfield et al., 1965), pre- 
sumably by passive transfer on the mouth 



parts rather than by undergoing a replica- 
tion cycle in the insect host. 

How do transmissible tumors manage 
to overcome histocompatibility barriers? 
It appears that there are a variety of mech- 
anisms, including downregulation of class 
I and class II MHC genes and secretion 
of immunosuppressive cytokines (Belov, 
2012; Siddle and Kaufman, 2015). In 
CTVT, there is a fine balance between 
progressive disease without an anti- 
tumor immune response and regression 
when allograft rejection kicks in. In 
DTFD, the tumor is relentlessly progres- 
sive. Both CTVT and DFTD have close to 
homozygous MHC class I alleles, and 
their initial emergence may have been 
facilitated by a relatively inbred host 
population with limited MHC diversity. 
Invertebrates like clams do not have 
as sophisticated a tissue recognition 
system as the MHC of higher vertebrates, 
yet certain cell surface molecules help 
to distinguish between self and non- 
self. However, non-malignant somatic 
cell invasion and even germ cell para- 
sitism has been documented in marine in- 
vertebrates such as colonial tunicates 
(Rinkevich, 2011). Metzger et al. (2015) 
suggest that the lack of an MHC system 
may make molluscs more susceptible 
than vertebrates to transmissible tumors. 
As CTVT is the only known transmissible 
cancer that can regress, understanding 
what triggers rejection may be key to 
inducing regression of other transmissible 
tumors and perhaps non-transmissible 
cancers too. 

As Metzger et al. (2015) document in 
their Introduction, leukemia occurs not 
only in soft-shelled clams, but also 
in other bivalve molluscs in the same 
region of North America. This observa- 
tion raises the question of whether the 
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tumors in other species represent cross- 
species infections from Mya arenaria or 
whether each host species has evolved 
its own transmissible tumor. CTVT is 
known to be readily transplantable 
experimentally to other canid species 
and even to foxes (Murgia et al., 2006; 
Belov, 2012), and it will be easy to deter- 
mine the species of origin of transmis- 
sible leukemias in other bivalves. If 
each tumor is species specific, what 
environmental factors may have facili- 



tated their independent emergence 
around the same time and place? 

How might transmissible tumors be 
contained to reduce the threat to their 
host species? CTVT in dogs has spread 
to all five continents, but it is self-limiting 
and presents no danger to the host spe- 
cies as a whole. In India, the high inci- 
dence of CTVT in street dogs has been 
diminished in some cities by castration. 
To save the Tasmanian devil from extinc- 
tion, a possible tumor vaccine is being 



explored as well as containment of 
healthy animals on islands and on penin- 
sulas with barrier fences to keep out 
affected devils (Belov, 2012). For the 
soft-shelled clam, it will be important to 
be vigilant to stop importation of clams 
from affected areas to currently unaf- 
fected ones like Florida. 

In the meantime, it is safe for humans to 
eat clams, even raw oysters. Enjoy them 
while you can! 

REFERENCES 

Arriagada, G., Metzger, M.J., Muttray, A.F., Sherry, 
J., Reinisch, C., Street, C., Lipkin, W.I., and Goff, 
S.P. (2014). Proc. Natl. Acad. Sci. USA 111 , 
14175-14180. 

Banfield, W.G., Woke, P.A., Mackay, C.M., and 
Cooper, H.L (1965). Science 148 , 1239-1240. 

Belov, K. (2012). BioEssays 34 , 285-292. 

Metzger, M.J., Reinisch, C., Sherry, J., and Goff, 
S.P. (2015). Cell 161 , this issue, 255-263. 

Murchison, E.P., Schulz-Trieglaff, O.B., Ning, Z., 
Alexandrov, L.B., Bauer, M.J., Fu, B., Hims, M., 
Ding, Z., Ivakhno, S., Stewart, C., et al. (2012). 
Cell 148 , 780-791. 

Murchison, E.P., Wedge, D.C., Alexandrov, L.B., 
Fu, B., Martincorena, I., Ning, Z., Tubio, J.M., 
Werner, E.I., Allen, J., De Nardi, A.B., et al. 
(2014). Science 343 , 437-440. 

Murgia, C., Pritchard, J.K., Kim, S.Y., Fassati, A., 
and Weiss, R.A. (2006). Cell 126 , 477-487. 

Pearse, A.M., and Swift, K. (2006). Nature 439, 549. 
Rinkevich, B. (2011). Chimerism 2, 1-5. 

Siddle, H.V., and Kaufman, J. (2015). Immunology 
144 , 11-20. 



192 Cell 161, April 9, 2015 ©2015 Elsevier Inc. 




Leading Edge 

Previews 



Gut Microbiota: The Link 
to Your Second Brain 



Cell 



Vanessa Ridaura^-^ and Yasmine Belkaid^-^ * 

■'Program in Barrier Immunity and Repair 
^Mucosal Immunology Section, Laboratory of Parasitic Diseases 
National Institute of Allergy and Infectious Disease, NIH, Bethesda, MD 20892, USA 
*Correspondence: ybelkaid@niaid.nih.gov 
http://dx.d 0 i. 0 rg/l 0.1 01 6/j.cell.201 5.03.033 



Serotonin is a highly ubiquitous signaling molecule that plays a role in the regulation of various 
physiological functions. Several lines of evidence, including the present work from Hsiao and col- 
leagues, demonstrate that, in the gut, microbial-derived metabolites affect the production of sero- 
tonin that in turn impacts host physiological functions. 



The body’s epithelial surfaces act as a 
scaffold to sustain diverse communities 
of commensals that include bacteria, 
archaea, fungi, protozoa, and virus. 
Although the notion that these microbial 
partners can promote human health is 
not a recent concept, the extent to which 
the microbiota controls all physiological 
systems has only recently begun to be 
appreciated. Of particular interest are the 
recent lines of investigation linking the mi- 
crobiota to both the hormonal and nervous 
systems. In this context, a set of recent 
findings, including the work of Hsiao and 
colleagues, uncovers a role for the micro- 
biota in controlling the production of a 
major neurotransmitter, serotonin. 

In 1967, Abrams and Bishop showed 
that animals devoid of live microbes 
(germ-free) had decreased gut motility 
compared to animals harboring a con- 
ventional mouse microbiota (Abrams 
and Bishop, 1967). Decreased serotonin 
levels in the absence of microbes have 
been proposed as a potential cause for 
this defect (Wikoff et al., 2009; Kashyap 
et al., 2013). Serotonin is a highly ubiqui- 
tous signaling molecule that plays a 
fundamental role in the regulation of 
various physiological functions via its ac- 
tion as both a neurotransmitter and a hor- 
mone. The vast majority of serotonin is 
produced in the gut by enterochromaffin 
cells, a specialized subset of cells strate- 
gically positioned to respond to chemical 
and mechanical stimuli (Figure 1). As a 
single molecule, serotonin has a remark- 
ably wide range of effects on host physi- 
ology, ranging from the control of gut 
motility, secretory reflexes, platelet ag- 



gregation, bone development, and car- 
diac function to the regulation of immune 
responses (Mawe and Hoffman, 2013). 
Germ-free mice display depressed levels 
of this neurotransmitter compared to 
conventionally raised animals (Wikoff 
et al., 2009). Colonization of germ-free 
animals with the gut microbiota from 
humans or mice can significantly accel- 
erate gut transit time, an effect that can 
be partially blocked using a pharmaco- 
logic antagonist of a serotonin receptor 
(Kashyap et al., 2013). 

In the present issue of Cell, the work of 
Yano et al. (2015), together with previous 
lines of evidence, proposes a model 
in which microbial-derived metabolites, 
such as short-chain fatty acids (i.e., buty- 
rate and acetate) or secondary bile acids 
(specifically deoxycolate), directly act 
upon enterochromaffin cells (ECs), 
inducing transcription of the rate-limiting 
serotonin biosynthetic enzyme thpl 
(Yano et al., 2015; Fukumoto et al., 
2003; Reigstad et al., 2014; Figure 1). A 
link between microbiota-derived meta- 
bolic products and enterochromaffin cell 
function has been previously reported by 
Reigstad et al., where they propose that 
short-chain fatty acids can modulate tran- 
scription of Chga gene, which encodes 
the neuroendocrine secretory protein 
chromogranin A, released together with 
serotonin (Figure 1). These results were 
corroborated by Yano et al. and corre- 
lated, in treated mice, with decreased 
transit time. Further, via increase in sero- 
tonin production, the presence of the mi- 
crobiota was associated with increased 
platelet activation and aggregation, find- 



ings that help to explain the improved 
coagulation at the site of injury in animals 
harboring a gut microbiota compared to 
germ-free mice. Of interest is the diversity 
of microbial-derived metabolites that can 
increase serotonin production, support- 
ing the idea that the mechanisms control- 
ling this brain-gut interaction are redun- 
dant and are unlikely to be affected by 
subtle microbial shifts. Because of the 
fundamental importance of serotonin in 
mediating central body functions, such 
redundancy may have been maintained 
as a means to sustain the production of 
this neurotransmitter in the face of consti- 
tutive microbial fluctuation. Nonetheless, 
not all microbes are equally efficient at 
promoting serotonin production. Yano 
et al. correlated the localized changes of 
serotonin with the presence of spore- 
forming bacteria, primarily from the Clos- 
tridium genus. Given the diversity of 
microbially derived metabolites, it is un- 
likely that a defined microbe will be asso- 
ciated with this effect. The mechanism by 
which the microbiota promotes serotonin 
production and, in particular, if and how 
enterochromaffin cells directly respond 
to microbiota-derived products and me- 
tabolites remain unclear. Furthermore, 
the systemic effects of this control on 
distal tissues and peripheral function are 
still unknown. 

Of interest, and yet to be studied, is the 
effect that increased serotonin may have 
on the ecology and metabolism of the 
gut microbiota (Figure 1). Serotonin can 
act directly or indirectly on the immune 
system, which in turn can shape the 
microbiota composition and localization. 
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Figure 1. Role of the Microbiota in Serotonin Production 

The gut microbiota influences the number and function of enterochromaffin cells, thereby promoting the 
release of serotonin (5-HT). The microbiota promotes serotonin production via various metabolites, 
including short-chain fatty acid (SCFA). Serotonin can act on various physiological systems promoting gut 
motility, secretory reflexes, and platelet function. The action of serotonin can be local as well as distal (on 
bone development and cardiac function) via platelet-mediated transport. Serotonin can also influence the 
immune system, an effect that could feed back on enterochromaffin cells and the microbiota itself. Defined 
microbes have also been shown to produce serotonin, a pathway that may further link the microbiota and 
host serotonin levels. 



The feedback loop associated with this 
control remains to be explored. Micro- 
bially induced changes in serotonin con- 
centrations may regulate the host’s 
immune response and may subsequently 
influence how the host deals with patho- 
gens or commensals. Of interest, defined 
microbes themselves can produce sero- 
tonin (O’Mahony et al., 2015); this raises 
the question of whether serotonin can 
play a direct role in microbial metabolism 
or in the maintenance of the ecological 
niche of certain phylogenetic groups 
(i.e., spore-forming bacteria). 



An important line of future investigation 
will be to explore how microbiota control 
of serotonin levels contributes to mucosal 
disorders, a question of particular impor- 
tance because of the known role of 
serotonin in promoting immunity and 
inflammation. Of interest, in various 
models of mucosal inflammation or infec- 
tions, as well as in Celiac disease pa- 
tients, serotonin levels are significantly 
increased (Mawe and Hoffman, 2013). 
Further, both antagonists and agonists 
to the serotonin receptor have been 
used to clinically treat a variety of gastro- 



intestinal disorders (Manocha and Khan, 
2012). Based on the present findings, 
further understanding of the mechanism 
by which the microbiota regulates host’s 
serotonin levels may be a first step toward 
developing pro- and/or prebiotic strate- 
gies to complement or potentially replace 
existing clinical treatments. 
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How do cells collectively control an organ’s behavior? By plucking various numbers of hairs from 
the mouse skin, Chen et al. show that hairs regenerate only when a sufficiently high density of 
them are plucked. Remarkably, a hair follicle can only regenerate in concert with other follicles, 
but not autonomously. 



A cell can modify its behavior in response 
to signaling molecules secreted by its 
neighboring cells. This cell, in turn, can 
secrete a signaling molecule that changes 
its neighboring cells’ behaviors. In a popu- 
lation of cells, such a dynamic back and 
forth between many cells means that there 
are often no sharp boundaries between 
actions of cells that are autonomous 
and those that influence other cells. This 
blending of many individuals into one 
collective entity, often at a macroscopic 
scale, is a hallmark of multicellular systems 
like tissues, organs, and populations of 
bacteria. It also necessitates quantitative 
analyses to identify the cascades of events 
that yield collective behaviors of cells. In 
this issue of Cell, Chen et al. (2015) use 
mathematical models and experiments 
to reveal that a group of hair follicles 
can “count” how many hairs have been 
plucked from the skin and then regenerate 
the lost hairs if the density of plucked hairs 
is above a certain density (i.e., “threshold 
density”) while not regenerating any hairs 
if the density of plucked hairs is below 
this threshold density (Figure 1 A). The au- 
thors have thus uncovered a rare example 
of quorum sensing, which has mainly been 
studied in bacteria, at the level of a whole 
organ in a live animal. 

Chen et al. (201 5) studied hairs that grow 
on the mouse skin. A “hair unit” consists of 
a hair that protrudes from the skin and 
its follicle that lies beneath the skin 
(Figure 1A) (Jahoda and Christiano, 2011). 
The follicle contains the stem cells from 
which a new hair can grow. The authors 
counted and plucked individual hairs from 
mice. By varying the geometry of regions 
on the skin from which the hairs were 
plucked and the number of plucked hairs. 



the authors discovered two scenarios. 
When the density of plucked hairs was 
below a certain threshold density, no folli- 
cles regenerated (Figure 1A). But when 
the density of plucked hairs was above 
the threshold density, both the follicles of 
the plucked hairs and of the surrounding 
intact hairs regenerated (Figure 1A). Thus, 
hair follicles are not autonomous. Instead, 
they collectively decide whether or not 
to regenerate both the lost and intact hair 
follicles. This ability of the follicles to mea- 
sure the density of lost hairs is a form of 
quorum sensing (Ng and Bassler, 2009), 
in which a group of cells “measure” its 
population density and then together 
launch a collective action (i.e., regeneration 
of all hairs) only when the density is high 
enough. Using quorum sensing, the folli- 
cles can ignore harmless minor hair losses 
while using its resources to repair only 
harmful major hair losses. 

Remarkably, Chen et al. (2015) discov- 
ered that a field of hair follicles spanning 
macroscopic distances (i.e., several mm) 
could quorum sense. Before degrading 
or being captured by a cell, a typical 
signaling molecule can travel no more 
than about 100 |im. Thus, a “distressed” 
follicle (i.e., a follicle of a plucked hair) 
can potentially use a signaling molecule 
to tell its adjacent follicle, which is typi- 
cally about 100 [im away, of its hair loss. 
However, using a mathematical model, 
the authors deduced that follicles could 
not quorum sense across millimeter dis- 
tances if they could only communicate 
with their immediate neighbors. In fact, 
they found that follicles must secrete a 
signal that traveled over a distance of 
1 mm, at a higher speed than any mole- 
cule could achieve. Turning back to the 



bench, the authors discovered that a 
distressed follicle recruited Ml macro- 
phages to it by secreting the attractant 
chemokine CCL2 (Figure IB). Immuno- 
staining revealed that these motile macro- 
phages first accumulated around the 
distressed follicles and then around the 
surrounding healthy ones. The macro- 
phages secreted the signaling molecule 
Tnf-a that stimulated the regeneration of 
both the healthy and distressed follicles 
through pathways that remain to be un- 
covered (Figure IB). Although more work 
is required, the authors’ data suggest 
that an appreciable accumulation of 
macrophages around the distressed folli- 
cles most likely occurs only when the 
density of distressed follicles is above 
the threshold density (Figure 1C). More 
importantly, the authors show that motile 
cells, along with signaling molecules, 
can transmit information over a macro- 
scopic distance between immobile cells. 
An interesting question for the future is if 
such coupling between random diffusion 
of signaling molecules and directed mo- 
tion of signaling cells may underlie collec- 
tive behaviors of other organs. 

An unresolved mystery is what deter- 
mines the threshold density of hair loss. In 
microbial cells, the threshold that divides 
whether every cell or no cell responds to 
a signaling molecule is primarily deter- 
mined by the binding affinity of the mole- 
cule to its receptor and a positive feedback 
regulation in a genetic circuit that controls 
the cells’ response to the signaling mole- 
cule (Ng and Bassler, 2009; Pai et al., 
2014; De Monte et al., 2007; Rotem et al., 
2010). But, in multicellular systems, it is un- 
clear how coordination of many different 
factors leads to a threshold and a binary 
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Figure 1 . A Hair Follicle Can Only Regenerate in Concert with Other Follicles, but Not by Itself 

(A) When the density of plucked hairs from the mouse skin is lower than a certain density (i.e., “threshold 
density”), no follicles regenerate. If the density of plucked hairs is higher than the threshold, all the follicles 
of plucked hairs (“distressed follicles”) and the follicles of surrounding intact hairs (“healthy follicles”) 
regenerate by entering growth (anagen) phase from a dormant (telogen) phase. 

(B) A distressed hair follicle secretes the cytokine Ccl2. M1 macrophages sense Ccl2 and swim toward the 
distressed follicle. 

(C) Macrophages secrete Tnf-a. Tnf-a activates regeneration of distressed and healthy follicles. Higher 
density of distressed follicles leads to a higher density of M1 macrophages recruited to the follicles. 

(D) Main factors that a group of cells may use to collectively make a binary decision. 



response (Figure 1D). In the case of the 
hair follicles, the short range of signaling 
mediated by diffusing molecules (e.g., 
Ccl2 and Tnf-a), the long range of signaling 
mediated by the motile macrophages, 
the spatial arrangements of the hair folli- 
cles, and the genetic circuits that control 
each cell’s secretion and response to the 
different signals must all fit together to 
set the threshold density of hair loss and 
a binary multicellular response (i.e., either 



every follicle or no follicle regenerates) 
(Figure 1 D). One possibility is that the mac- 
rophages make a binary decision (i.e., 
either move toward the distressed follicle 
or not) while the follicles are incapable of 
making any binary decisions. Another pos- 
sibility is that a distressed follicle measures 
the density of macrophages surrounding it 
in such a way that it only regenerates when 
there is a sufficiently large density of mac- 
rophages. 



Chen et al. (2015) and other recent 
studies (Flart et al., 2014; Sgro et al., 
2015) motivate us to investigate how 
multiple cells, each with its own unique 
genetic circuit, can together achieve a 
collective function (e.g., a group of cells 
making a binary decision) that is analo- 
gous to certain behaviors of unicellular 
genetic circuits (e.g., a bistable genetic 
circuit). A key question is if there are other 
unicellular behaviors, which are governed 
by networks of genes that multicellular 
systems mimic with networks of commu- 
nicating cells. A promising way to address 
this question is by building genetic circuits 
and cell-cell communications to reveal 
what sorts of multicellular behaviors can 
arise from them (Regot et al., 201 1 ; Youk 
and Urn, 2014). An exciting outcome of 
this approach might be a realization that 
only a very small collection of genetic cir- 
cuits and cell-cell interactions can yield a 
wide variety of collective behaviors of 
cells in nature. 
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Lysosomes provide a major source for cellular cholesterol; however, most of this cholesterol is 
trafficked to the plasma membrane via unknown mechanisms. Chu et al. identify an unexpected 
role for peroxisomes in the transport of cholesterol from the lysosome to the plasma membrane 
via a lysosome-peroxisome membrane contact site. 



Cholesterol is an essential determinant 
of membrane fluidity, permeability, and 
organization in animal cells. While the 
vast majority of cellular cholesterol (about 
60%-80%) is localized at the plasma 
membrane (PM) (Maxfield and Wustner, 
2002), cholesterol originates from the ER 
via biosynthesis and the lysosome via 
import of exogenous cholesterol. These 
observations raise a fundamental ques- 
tion, how is cholesterol transported from 
the ER or lysosome to the PM? In this 
issue of Cell, Chu et al. (2015) discover 
that peroxisomes play a critical role in 
the transport of cholesterol from the lyso- 
some to the PM and uncover an unex- 
pected membrane contact site between 
the peroxisome and lysosome (Figure 1). 

Exogenous cholesterol enters the cell 
as low density lipoproteins (LDL) via 
endocytosis of the LDL receptor. Upon 
delivery to the lysosome, LDL-derived 
cholesterol esters are de-esterified into 
free cholesterol then exported to other 
compartments including the PM (Maxfield 
and Wustner, 2002). The physiological 
importance of cholesterol transport out 
of the lysosome is underscored by Nie- 
mann-Pick disease type C (NPC). NPC is 
a fatal, predominantly neurodegenerative 
disorder caused by mutations in NPC1 
or NPC2, which results in cholesterol 
accumulation in the lysosome. NPC1 
and NPC2 work together to transport 
free cholesterol out of the lumen to the 
limiting membrane of the lysosome (Du 
et al., 2011; Kwon et al., 2009; Vanier, 
201 5). The molecular mechanisms of sub- 
sequent steps, exit of cholesterol from the 
lysosomal membrane and delivery to the 
PM, were largely uncharacterized. 

To identify proteins required for trans- 
port of LDL-derived cholesterol, Chu 



et al. (2015) design an elegant screen 
that takes advantage of the antibiotic 
Amphotericin C, which permeablilizes 
the PM through association with exposed 
cholesterol, as well as U18666A, which 
enables them to stage the release of 
LDL-derived cholesterol from the lyso- 
some. Using shRNA, they identify 341 
candidate genes. Surprisingly, several 
candidates are related to peroxisomal 
function and biogenesis. Knockdown of 
these peroxisome related genes results 
in the accumulation of cholesterol in the 
lysosome lumen. 

Analysis of cultured wild-type cells 
by both fluorescence microscopy and 
transmission electron microscopy reveals 
a previously unrecognized membrane 
contact site between lysosomes and 
peroxisomes. Further evidence for the 
lysosome-peroxisome contact site is 
provided by multiple in vitro studies 
demonstrating an interaction between 
these organelles. Chu et al. find that the 
lysosome-peroxisome contact site is 
bridged, at least in part, by the integral 
lysosomal membrane protein, synapto- 
tagmin 7 (syt7), through binding to 
PI(4,5)P2 on the peroxisomal membrane. 
The lysosome-peroxisome contact site 
is transient and cholesterol-dependent. 
Notably, efficient formation of the lyso- 
some-peroxisome contact site also re- 
quires NPC1 suggesting that this contact 
is important for cholesterol exit from the 
lysosome. 

Three of the peroxisomal genes identi- 
fied in this study have been implicated in 
human diseases. X-linked adrenoleuko- 
dystrophy. Infantile Refsum disease, and 
Zellweger syndrome are caused by muta- 
tions in ABCD1, PEX1 , and PEX26 (Au- 
bourg and Wanders, 2013). Strikingly, 



Chu et al. (2015) show that cells from 
patients suffering from each of these dis- 
eases accumulate cholesterol in lyso- 
somes to a similar extent as those from 
NPC patients. The potential contribution 
of defects in cholesterol trafficking to 
symptoms of these diseases must now 
be considered. 

The precise roles of the lysosome- 
peroxisome contact site in facilitating 
cholesterol transport out of the lysosome 
on its path to the PM remains to be deter- 
mined. Chu et al. (2015) provide evidence 
from in vitro, as well as cell-based studies, 
that cholesterol is transferred from lyso- 
somes to peroxisomes via the lysosome- 
peroxisome membrane contact site, 
raising the possibility that cholesterol 
may transit through the peroxisome on 
its way to the PM. It is also possible 
that the contact site facilitates transport 
of cholesterol out of the lysosome 
directly to a different organelle, such as 
the ER. In support of this possibility, 
knockdown of the oxysterol binding 
protein-related protein 5 (ORP5), which 
is localized to the ER, has been shown 
to result in accumulation of NPC1 depen- 
dent pools of cholesterol in the limiting 
membrane of the lysosome (Du et al., 
2011). Orp5 may act in a parallel pathway, 
or downstream of Syt7 and ABCD1 in 
the transfer of cholesterol out of the 
lysosome, either to the ER or to the perox- 
isome. 

Different pathways of cholesterol trans- 
port out of the lysosome may function in 
different cell types or under different con- 
ditions and perhaps only a subset of these 
pathways are directed to the PM. A 
comprehensive assessment of the molec- 
ular components of the lysosome-peroxi- 
some contact, including the consideration 
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Figure 1. The Lysosome- Peroxisome Contact Site Joins a Growing List of Inter-Organelle 
Contact Sites 

Known contact sites between the ER-mitochondria, ER-PM, ER-vacuole, ER-endosome/lysosome, ER- 
peroxisome, ER-Golgi, ER-phagosome, ER-lipid droplet, mitochondria-PM, and mitochondria-lysosome/ 
vacuole are indicated. 



of a possible three-way lysosome-peroxi- 
some-ER junction, is necessary. This 
knowledge will be critical to understand- 
ing the peroxisome-dependent mecha- 
nisms of cholesterol transport out of the 
lysosome and for the development of 
disease therapies. 

The lysosome-peroxisome contact site 
joins a growing list of inter-organelle con- 
tact sites. The known membrane contact 
sites currently include: ER-mitochondria, 
ER-PM, ER-lysosome/vacuole, ER-endo- 
some, ER-Golgi, mitochondria-lysosome/ 
vacuole, and mitochondria-PM (Prinz, 
2014). Note that the lysosome-peroxi- 
some contact site is now the third 
example of a critical contact between 
the lysosome and another organelle. 
Among the identified contact sites, some 
share partial functional redundancy. For 
example, the ER-mitochondrial encounter 



structure (ERMES) and the vacuole and 
mitochondria patch (vCLAMP) are distinct 
membrane contact sites that connect the 
mitochondria to the ER and the yeast 
vacuole, respectively (Elbaz-Alon et al., 
2014; Honscher et al., 2014). Loss 
of either the ERMES or the vCLAMP 
has minimal phenotypic consequences, 
whereas simultaneous loss of both is 
lethal. Additional inter-organelle contacts 
are likely to be discovered. 

In addition to specialized functions 
including calcium homeostasis and stor- 
age, intracellular signaling, organelle divi- 
sion, and lipid biosynthesis, membrane 
contact sites have repeatedly been 
shown to facilitate lipid transfer between 
membranes (Elbaz and Schuldiner, 201 1 ; 
Prinz, 2014). It is tempting to speculate 
that inter-organelle contacts are the 
major routes of lipid transfer between 



cellular compartments and fundamental 
to the accurate distribution of distinct 
lipid species throughout the cell. 
Currently, we are limited in our ability 
to observe the movements of lipids 
within cells because robust assays for 
tracking most lipids via microscopy do 
not yet exist. The development of 
methods facilitating such observations 
will be invaluable to advancement of 
the field. 
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Benanti et al. report that Burkholderia pseudomallei and Burkholderia mallei bacteria express 
proteins that mimic Ena/Vasp family proteins to polymerize actin, thereby inducing actin-based 
motility. Thus, bacteria can use the various cellular actin polymerization mechanisms for intra- 
and inter-cellular dissemination. 



Intracellular bacterial pathogens such as 
Listeria monocytogenes and Shigeiia fiex- 
neri received a lot of attention in the early 
1990s, when it was discovered that, after 
internalization by mammalian cells and 
escape from the endocytic vacuole, they 
actively recruit monomeric actin and poly- 
merize it into filaments. This actin assem- 
bly process creates a force that then 
propels the bacteria through the cytosol. 
In this issue of Ceii, Benanti et al. (2015) 
reveal that bacterial pathogens Burkhoi- 
deria pseudomaiiei and Burkhoideria mai- 
iei move inside of cells by mimicking one 
of the several cellular mechanisms that 
control actin polymerization. 

The study of bacterial motility led to the 
discovery of the role of Arp2/3 in actin 
polymerization (Welch et al., 1998). This 
seven-protein complex needs to be 
activated by WASP family proteins to 
generate a dense array of actin filaments. 
Polymerization of actin filaments takes 
place on preformed filaments at an angle 
of 70°, a characteristic feature that can 
be visualized in cells by electron micro- 
scopy or in vitro using fluorescent actin 
and Arp2/3. Other nucleators, such as for- 
mins or Ena/VASP proteins, as recently 
shown (Winkelman et al., 2014), generate 
linear long filaments. These proteins act 
processively on barbed ends in which 
they sit during the entire polymerization 
process. Interestingly, while Listeria and 
Shigeiia use Arp2/3, the intracellular path- 
ogens Rickettsia early in infection use 
Arp2/3 and later on switch to a different 
mechanism, expressing proteins that 
mimic formins to promote actin-based 
motility (Welch et al., 1998; Suzuki et al., 
1998; Gouin et al., 2004; Reed et al., 
2014) (see below and Figure 1). 



It is, however, the first time that bacte- 
rial mimics of Ena/VASP proteins are 
characterized (Benanti et al., 2015). The 
proteins encoded by the two Burkhoideria 
species are called BpBimA and BmBimA 
and display three or one WH2 domains, 
respectively. WH2 domains bind to actin 
and are present in a number of actin- 
binding proteins, including Ena/VASP 
proteins. Two hallmarks of Ena/VASP 
proteins are their properties to oligomer- 
ize and to uncap actin filament barbed 
ends capped by capping proteins. Ena/ 
VASP proteins can thus nucleate actin, 
elongate filaments at their barbed ends, 
and bundle them. 

Previously, it was known that the non- 
pathogenic species Burkhoideria thaiian- 
densis expressed a BimA protein called 
BtBimA and that Arp2/3 was necessary 
to induce actin-based motility. Here, the 
authors demonstrate that BtBimA has a 
VGA domain, known to bind and activate 
Arp2/3. In contrast, this domain is absent 
in BpBimA or BmBimA of the pathogenic 
species. A VGA domain is also present in 
WASP family proteins and in AcXAot Liste- 
ria (Kocks et al., 1 992). ActA is a bona fide 
mimic of WASP family proteins (Welch 
et al., 1998) (Figure 1). In the case of 
Shigeiia, the outer-membrane protein 
IcsA/VirG (Bernardini et al., 1989) recruits 
N-WASP (Suzuki et al., 1998), which 
in turn recruits and activates Arp2/3 
(Figure 1). In Rickettsia, early during infec- 
tion, the protein RickA, which displays a 
VGA domain, mimics WASP family pro- 
teins and recruits Arp2/3 (Gouin et al., 
2004; Reed et al., 201 4). Later, the surface 
protein Sca2 acts as a form in (Reed et al., 
201 4). As Sca2, BimA proteins belong to a 
family of bacterial proteins called “auto- 



transporters” because they have the 
capacity to insert themselves in the outer 
membrane and then display their N-termi- 
nal parts on the outside of the bacterium. 
To this end, they trimerize via a coiled-coil 
domain that generates a pore through 
which transport does occur. In the case 
of BimA proteins, trimerization results in 
the exposure on the surface of nine WH2 
domains for B. pseudomaiiei and three 
WH2 for e. maiiei. 

In their study, Benanti et al. perform a 
series of in vitro assays with purified pro- 
teins, mutants analysis, observations of 
comet tails, and plaque assays to analyze 
the dissemination of the bacteria from one 
infected cell to its neighbors. They find 
that BtBimA polymerizes actin if Arp2/3 
is present but does not in its absence. In 
contrast, the two other BimAs are able 
to polymerize actin in the absence of 
Arp2/3. They also demonstrate that the 
two proteins remain associated with 
the barbed ends as the filaments grow. 
The affinities of BpBimA and BmBimA 
for the barbed ends are in a similar range 
to those of formin and Ena/VASP for 
barbed ends, supporting the idea that 
they are functionally related. In addition, 
the two BimAs can gather two filaments 
and elongate them as VASP does, as 
well as displace GapZ from the barbed 
ends. By creating mutations in the 
coiled-coil domain, the authors establish 
that trimerization is critical for actin nucle- 
ation, barbed end elongation, and anti- 
capping activity. They then compare the 
tails produced by B. thaiiandensis, B. 
pseudomaiiei, and B. maiiei and find that 
they are curved for B. thaiiandensis and 
straight for the two other species. Impor- 
tantly, the efficiency of movement (length 



CrossMark 



Gell 161, April 9, 2015 ©2015 Elsevier Inc. 199 




Cell 



1 Listeria 1 


Shigella 


Burkhoideria | 




V J 


[1 thailandensis J 



% Actin 
*81 Arp 2/3 
I^CapZ 



Rickettsia I 


Rickettsia 


Burkhoideria 


Burkhoideria 


1 30 minutes I 


48 hours 


mallei a 


1 pseudomallei J 


postinfection 


postinfection 



P^RickA 



Sca2 



[ BpBimA 




Figure 1. Bacterial Actin-Based Motilities 

(Top) Schematic representation of the actin-based motiiities of Listeria, Shigeiia, Rickettsia spp, and 
Burkhoideria spp. 

(Bottom) Eiectron micrographs of L monocytogenes taiis (ieft) and R. conorii taWs (from Gouin et ai., 1999 
and our unpubiished data). 



of displacement) is higher in the case of 
B. pseudomallei and B. mallei, as these 
bacteria moved in a linear path in contrast 
to B.thailandensis, which moved in a 
curved path, with potential implications 
for the pathogenicity of these species. 



In summary, this study unveils how 
bacteria can exploit different mechanisms 
offered by the host cell to polymerize 
actin. In fact, the results raise the possibil- 
ity that other bacteria such as Rickettsia, 
although belonging to the same genus. 



may also use different mechanisms to 
move inside of cells. Finally, it remains 
possible that BimA proteins play other 
roles in infection, as is the case of ActA, 
which covers the bacterial surface and 
as a Trojan horse protects Listeria from 
autophagy (Yoshikawa et al., 2009), 
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Although cancer immunotherapy can lead to durable outcomes, the percentage of patients who 
respond to this disruptive approach remains modest to date. Encouragingly, nanotechnology 
can enhance the efficacy of immunostimulatory small molecules and biologies by altering their 
co-localization, biodistribution, and release kinetics. 



Awakening the Immune System 

Although the research community has made great inroads into 
understanding the underlying etiology of cancer, our ability to 
confer durable responses to patients remains rather limited. 
The complexity of cancer aside, a major obstacle impeding our 
progress has been the widespread emphasis on cancer as a 
cell-autonomous disease. Few biologists would study gill phys- 
iology by removing a fish from water, yet we routinely interrogate 
cancer cells outside of their natural habitat, discounting the 
importance of the tumor microenvironment. In addition to stro- 
mal cells and extracellular matrix, immune cells greatly impact 
disease initiation, progression, and invasion. 

Indeed, the type, density, and location of immune cells within 
tumors predict patient survival as well as, if not better than, 
traditional histopathological methods. This so-called “immune 
contexture” — most notably the presence of CD8'^CD45R0‘^ 
T cells and Th 1 cells— is associated with a good prognosis 
across at least 20 different cancer types (Fridman et al., 2012). 
Accordingly, oncologists are eager to arouse exhausted immune 
cells, and clinical data confirm that stimulating a patient’s natural 
antitumor immune response can cure relapsed, refractory pa- 
tients with difficult-to-treat cancers who have exhausted other 
treatment options (Topalian et al., 201 1). 

Challengingly, tumors can evade immune surveillance. Conse- 
quently, most immunotherapies, particularly those directed 
against solid tumors, have thus far benefited only a minority of 
patients. For this reason, facilitating antitumor immune cells to 
overcome the activation energy barrier presented by the immuno- 
suppressive tumor microenvironment is an area of active investi- 
gation. Emerging preclinical and clinical data suggest that delivery 
of immunostimulatory molecules from nanoparticles and scaffolds 
can rouse the immune system with greater rigor than delivery of 
these same molecules in solution, leading to improved antitumor 
immunity and survival outcomes. Accordingly, biologists and 
engineers are working to improve our understanding of which cells 
and pathways should be perturbed to maximize efficacy and what 
tools are most appropriate to perturb them as desired. 

The Killer App for Nanomedicine? 

Nanoparticles are synthetic particles (generally derived from 
polymers, lipids, or metals) with sizes on the nanometer scale. 



which confers properties that bridge bulk and molecular struc- 
tures. Such nanoparticles can be loaded with therapeutic com- 
pounds to achieve concentrated local drug delivery with poten- 
tial for sustained release when biodegradable carriers are used. 
Their high surface-area-to-volume ratio enables them to be 
coated with various ligands (e.g., antibodies or aptamers) that 
can facilitate interaction with cognate molecules, including re- 
ceptors present on the surface of target cells. Although nanopar- 
ticles can improve the pharmacokinetic properties of their drug 
payloads (Chow and Flo, 201 3), their ability to target cancer cells 
specifically and efficiently has proven somewhat elusive. Target- 
ing nanoparticles to specific receptors on cancer cells augments 
cellular uptake but not tumor localization, which is governed by 
passive accumulation through leaky vasculature. In contrast, 
leukocytes can actively traffic to tumors along chemokine gradi- 
ents, rendering these cells the ultimate “targeted” therapy. 

Delivery of immunostimulatory drugs to antitumor immune 
cells may be a more efficient tactic to eradicate tumors than 
delivery of cytotoxic drugs to cancer cells (Figure 1). While the 
ability to concentrate nanoparticles within tumors upon sys- 
temic administration remains a challenge, immune cells prolifer- 
ate extensively upon activation. As a consequence, unlike for 
cancer cells, successful payload delivery to even a small fraction 
of immune cells can achieve robust antitumor efficacy. More- 
over, tumors are heterogeneous and ever evolving, so drugs 
that are designed to kill cancer cells directly by targeting cell- 
intrinsic pathways inherently select for resistant clones that 
lead to relapse. In contrast, immune cells can generate a coordi- 
nated and adaptive antitumor response with capacity for mem- 
ory that is not achievable using any other therapeutic modality. 

Improving the Efficacy of Cancer Vaccines 

Dendritic cells (DCs) are critical initiators of adaptive immune re- 
sponses and are thus extremely relevant targets for anticancer 
nanomedicines. Co-administration of antigen and adjuvant as 
free drugs can result in delivery of antigen to some DCs and adju- 
vant to others. Delivery of antigen in the absence of adjuvant 
induces immunologic tolerance, thereby inhibiting robust anti- 
tumor responses. Co-encapsulation of antigen and adjuvant in 
a common particle enables co-delivery of both components to 
the same DC, leading to improved induction of antigen-specific 
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CD8^ T cells, which are critical mediators of antitumor immunity. 
Sustained antigen release from a particle within DCs can further 
enhance cytolytic T lymphocyte (CTL) priming in vitro by extend- 
ing antigen presentation (Audran et al., 2003). Such particles 
serve as antigen reservoirs, thereby mimicking both prime and 
boost injections following a single administration. 

As yet, the induction of robust T cell responses in large 
animal models has not been achieved using traditional protein 
vaccine-based approaches, which have primarily elicited hu- 
moral B cell responses. Excitingly, preliminary studies suggest 
that nanoparticle-based vaccines may confer cross-priming 
efficacy in non-human primates and humans similar to that 
observed in mice. Should such findings be validated in larger co- 
horts, then nanovaccines would serve as an important break- 
through for the development of vector-free vaccines (Irvine 
et al., 2013). 

Owing to their pathogen-like size, nanoparticles are readily 
taken up by antigen-presenting cells, such as DCs, which are 
natural phagocytes. As a consequence, even untargeted 
nanoparticles improve the uptake— which often correlates with 
antitumor efficacy— of cancer vaccines relative to their soluble 
forms. Altering particle size, hydrophobicity, and surface charge, 
as well as conjugating targeting ligands, can further enhance up- 
take efficiency (Cruz et al., 201 2). Targeting nanoparticles to DCs 
has proven much more feasible than targeting nanoparticles to 
cancer cells. This difference is rooted in physics as much as in 



Figure 1. Applications of Nanotechnology 

Clockwise from bottom left: nanocarriers can be 
used to (1) deliver cancer vaccine antigens and 
adjuvants to dendritic cells, (2) stimulate T cells 
directly as artificial antigen presenting cells, (3) 
concentrate immunostimulatory compounds in the 
immunosuppressive tumor microenvironment, 
and (4) deliver supportive drugs to T cells in the 
circulation. (Image credit to Mohammad H. Saleh.) 



biology. First, owing to the fenestrated 
architectures of secondary lymphoid 
organs, nanoparticles naturally accumu- 
late in these structures— particularly the 
spleen— which are populated by many 
DCs. Second, secondary lymphoid or- 
gans do not exhibit the physical barriers 
to entry that are characteristic of solid 
tumors, such as elevated interstitial pres- 
sure and impaired diffusion caused by 
unusually dense extracellular matrix. 

Of note, the subset of DC targeted is 
critical to defining the induction and regu- 
lation of immune responses (Ueno et al., 
2011). Plasmacytoid DCs can be con- 
verted from toleragenic to innate immu- 
nostimulatory upon uptake of Toll-like 
receptor (TLR) 7 and/or 9 agonists. To 
achieve adaptive responses, distinct sub- 
sets of classical DCs can be targeted 
by nanoparticles derived from poly(lac- 
tic-co-glycolic acid) (PLGA)— a biode- 
gradable, FDA-approved polymer— to which antibodies are 
coupled. The C-type lectin receptor on the surface of the DC 
that is targeted by the antibody defines the type of immune 
response produced. For example, targeting of DC-SIGN, DEC- 
205, DNGR-1 , and Langerin favors CD8^ T cell cellular (ThI) re- 
sponses, whereas targeting of DCIR2 favors CD4^ T cell and B 
cell humoral (Th 2) responses (Cruz et al., 2012). Vaccine potency 
may be maximized by targeting multiple DC subsets, thereby 
inducing both cellular and humoral immune responses (Ueno 
et al., 2011). 

In addition to delivering information regarding specificity 
and activity to DCs, some investigators have considered the 
design of artificial antigen-presenting cells (aAPCs) that can 
cross-prime antigen-specific CDO'^ T cells directly. Synthetic 
aAPCs are particles to which proteins required for T cell activa- 
tion-such as MFIC-epitope or agonist anti-CD3 (Signal 1 to 
the T cell receptor) and agonist anti-CD28 (co-stimulatory 
Signal 2)— have been conjugated. Manipulating particle shape 
and geometry revealed that aAPC activity correlates with aspect 
ratio (Sunshine et al., 2014). Mechanistically, CD8^ T cells 
migrate preferentially to the long axis of ellipsoidal aAPCs, and 
this extended length of contact increases T cell proliferation 
and, consequently, tumor prevention. 

These data not only have relevance to design parameters 
for future aAPCs but also provide insights into the funda- 
mental biology of DC-T cell interactions. Indeed, the result was 
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unexpected, as high aspect ratio has previously been associated 
with increased particle internalization by non-phagocytic cells. In 
addition to nanoparticles, scaffolds can be used to vaccinate 
against cancer. Scaffolds similarly offer practical and functional 
advantages over conventional DC-based vaccines, which 
require isolation, ex vivo manipulation, and reintroduction of a 
patient’s DCs. 

Scaffolds: Customized Microenvironments 

Polymeric scaffolds and hydrogels can be implanted or injected 
to generate a modular, tailored local microenvironment that can 
co-localize inflammatory cytokines, tumor antigen, and immune 
danger signals in situ. For example, incorporation of the chemo- 
kine GM-CSF, autologous tumor lysate, and the TLR9 agonist 
CpG-oligonucleotide into a subcutaneously implanted porous 
PLGA scaffold promotes recruitment and activation of DCs, re- 
sulting in regression of established local and distant tumors (AN 
et al., 2009). The observed potency is attributed to the persistent 
presence of antigen and adjuvant signaling in the depot, which 
is not attainable upon delivery of soluble vaccine components 
that diffuse away fairly rapidly. This exciting scaffold-based 
approach is currently being investigated in a phase I clinical trial 
(NCT01 753089). To avoid the need for surgical implantation, an 
injectable spontaneously assembling scaffold has been devised 
(Kim et al., 2015). Specifically, mesoporous silica rods with high 
aspect ratios form macroporous structures that provide a 
favorable microenvironment for DCs, which subsequently traffic 
to lymph nodes and provoke adaptive immune responses. 

Scaffolds can similarly be used to improve the function of 
adoptively transferred T cells by providing a supportive immuno- 
logic microenvironment. Adoptive cell transfer (ACT), particularly 
upon introduction of chimeric antigen receptors into T cells 
(Maude et al., 2014), can lead to sustained remissions in 
hematologic malignancies. Solid tumors, however, establish a 
concentrated immunosuppressive microenvironment that ham- 
pers the efficacy of ACT. Transplantation of lymphocytes in 
biodegradable polymeric scaffolds can sustain expansion and 
release of tumor-reactive T cells at tumor resection sites and 
enhance their antitumor potency (Stephan et al., 2015). Scaf- 
fold-derived T cells reduce residual disease and relapse much 
more effectively than free T cells administered systemically or 
locally. Such depots provide proof of concept for localized 
delivery of cells, in addition to small molecules and biologies. 
Localized immunotherapy is particularly well suited for treatment 
of inoperable or incompletely removed tumors to prevent local 
recurrence (Stephan et al., 2015), and its effects can have wide- 
spread implications. 

Localized Nanoimmunotherapy: Focal Impact 

Achieving a robust local antitumor effect— as previously 
observed for the radiation-induced “abscopal effect” — can 
generate a systemic antitumor immune response that can erad- 
icate disseminated disease, including metastases situated in 
sites generally thought to be tumor cell havens in the context 
of traditional systemic therapy (Marabelle et al., 2013). Lipo- 
somes can be used to anchor immunomodulatory compounds, 
such as immunostimulatory nucleic acids and biologies (Kwong 
et al., 2013), prior to intratumoral injection. Such particles restrict 



the biodistribution of these compounds and prolong their reten- 
tion at the tumor site. In so doing, localized nanoimmunotherapy 
reduces systemic toxicity and thus improves the therapeutic 
window of extremely potent immunostimulatory molecules while 
still promoting systemic antitumor immunity (Kwong et al., 201 1). 

Delivering Tx to T Cells in Circulation 

Immunoengineering also enables drug delivery directly to T cells. 
The conjugation of nanoparticles loaded with supportive com- 
pounds to the surface of adoptively transferred T cells leads to 
persistent autocrine-like signaling among these “pharmacytes” 
(Stephan et al., 2010). This approach again demonstrates the 
impact of nanotechnology relative to administration of free drug 
and represents a paradigm that can be applied more broadly 
than the ACT-supportive scaffold described above, as it does 
not necessitate surgical implantation. Ideally, one would be 
able to deliver such adjuvant drug-containing nanoparticles to 
T cells upon systemic administration, enabling a generalized 
approach that does not require ex vivo cell manipulation for 
each patient. Excitingly, liposomes to which targeting ligands— 
antibody fragments or cytokines— have been conjugated can 
target drug delivery to adoptively transferred T cells in vivo (Zheng 
et al., 2013). Future work will likely enable targeted delivery to 
endogenous T cells and, ultimately, other cell types as well. 

Concentrating Catalysis 

In addition to delivering small molecules, oligonucleotides, anti- 
gens, and cytokines, nanoparticles can be used to concentrate 
enzymes in vivo. For example, particles can be used to degrade 
neutrophil extracellular traps (NETs). NETs are extracellular DNA 
structures that, when formed intravascularly, can sequester circu- 
lating tumor cells and thereby promote metastasis. Digesting 
NETs with free DNase is relatively inefficient at inhibiting metas- 
tasis (Cools-Lartigue et al., 2013), while DNase-coated nanopar- 
ticles vastly improve therapeutic efficacy (J. Park, R.W. Wysocki, 
Z. Amoozgar, M.S.G., and M. Egeblad, unpublished data). 

Looking Ahead 

Moving forward, the field of immunoengineering will benefit from 
a broader adoption of novel tools that permit multiplex analysis 
of cell type (multiplexed ion beam imaging), cell activation state 
(mass cytometry), and soluble mediators of stimulation/inhibition 
(Luminex) in the tumor microenvironment and circulation 
following perturbation. By allowing for interrogation of several- 
fold more parameters simultaneously than conventional meth- 
odologies, such as flow cytometry and ELISA, such tools will 
yield insights into the coordination of the highly complex immune 
system. A comprehensive understanding of the downstream 
impacts of our interventions, including expression of co-stimula- 
tory/inhibitory ligands and production of immunoregulatory 
cytokines, will enable rational product revision for improved 
therapeutic outcomes. 

Beyond enhancing our appreciation of the cellular and 
biochemical constituents of the tumor microenvironment, we 
will benefit from an increased consideration of the physical 
microenvironment in the tumor, as well as its draining lymph no- 
des (Swartz and Lund, 2012). Indeed, extracellular matrix serves 
as a physical mediator of immunosuppression by preventing 
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penetration of immune cells into the tumor core (Salmon and 
Donnadieu, 2012). Immunoengineering can be used to alter the 
physical microenvironment of tumors, for example, by modifying 
peritumoral extracellular matrix (Kanapathipillai et al., 2012). 

To this end, in addition to enabling more thorough descriptions 
of immune cell function (reading), advanced technologies can 
be used to create physical lymphoid-like structures to study 
and manipulate immune cell function (writing). Improving the 
reproducibility of formulation and fabrication methods is critical, 
as manufacturing represents perhaps the greatest obstacle con- 
fronting the clinical translation of nanodevices. Controlled pro- 
duction can minimize polydispersity. For example, 3D printing 
is revolutionizing the field of regenerative medicine, and it has 
the potential to influence cancer immunoengineering similarly. 
Initially, this technology will likely be applied to produce defined, 
improved scaffolds for vaccine applications or supportive ACT. 
In the years ahead, it could be used to create implantable artifi- 
cial tertiary lymphoid structures, which possess defined zones 
for specialized immune cells and are important to long-term can- 
cer patient survival (Fridman et al., 2012). 

Transplantable lymphoid-like organoids can already be engi- 
neered to manifest discrete compartments for particular immune 
cells, which generate functional humoral and cellular responses 
to vaccination (Suematsu and Watanabe, 2004). 3D printing will 
allow for deposition of specific cytokines, immune cells, and 
matrix with unprecedented accuracy. This advance will have 
relevance not only to translational biology but also to basic 
immunology, as engineered scaffolds can enhance our under- 
standing of the biochemical and physical microenvironments 
that alter the balance between tolerance and rejection (Swartz 
et al., 2012). 

By concentrating the delivery of their payloads, nanoparticles 
permit the use of considerably lower doses of immunostimula- 
tory molecules to achieve a given response and thereby enhance 
the safety profiles of these drugs (Irvine et al., 201 3). Still, the ma- 
terials from which nanodevices are created can inherently pro- 
voke a host response, so meaningful safety parameters must 
be defined, such as serum levels of type I interferons and IL-6. 
Unlike for prophylactic vaccines, such responses are likely 
acceptable to cancer patients and may even be beneficial in 
stimulating antitumor immunity, but they must be well under- 
stood nonetheless. Encouragingly, tocilizumab (anti-IL6R) has 
been used to manage cytokine-release syndrome successfully 
in the acute setting (Maude et al., 2014). Placing an emphasis 
on the development of safe biomaterials will facilitate earlier 
translation of immunoengineered products into patients. Data 
gleaned from patients will be more informative than anything 
that can be derived from preclinical models. 

Emerging evidence confirms that cancer immunotherapies, 
which can generate adaptive and durable responses, yield 
much more robust antitumor effects when they are formulated 
in nanoparticles or scaffolds than when they are administered 
as free drugs. Cancer immunoengineering is thus a promising 
area worthy of further consideration and investigation. It is hoped 
that this piece will stimulate basic biologists to engage bioengi- 
neers and to articulate the questions that they would like to see 



addressed with innovative technologies. In addition to its thera- 
peutic potential, immunoengineering provides a valuable tool for 
dissecting fundamental biology. 
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Research in two fronts has enabled the development of therapies that provide significant benefit to 
cancer patients. One area stems from a detailed knowledge of mutations that activate or inactivate 
signaling pathways that drive cancer development. This work triggered the development of tar- 
geted therapies that lead to clinical responses in the majority of patients bearing the targeted 
mutation, although responses are often of limited duration. In the second front are the advances 
in molecular immunology that unveiled the complexity of the mechanisms regulating cellular im- 
mune responses. These developments led to the successful targeting of immune checkpoints to 
unleash anti-tumor T cell responses, resulting in durable long-lasting responses but only in a frac- 
tion of patients. In this Review, we discuss the evolution of research in these two areas and propose 
that intercrossing them and increasing funding to guide research of combination of agents repre- 
sent a path forward for the development of curative therapies for the majority of cancer patients. 



Introduction 

The scientific community united against a common enemy in 
1971 when President Nixon signed a bill initiating the “War on 
Cancer,” which provided funding for scientific research focused 
on improving our understanding and treatment of cancer. 
Without doubt, the intervening years were followed by great 
advances in the elucidation of the molecular mechanisms that 
regulate growth and death of normal cells, including a deep 
understanding of how these pathways progressively go awry 
during the development of cancer. This understanding led to 
the era of genomically targeted therapies and “precision medi- 
cine” in the treatment of cancer. Genomically targeted therapies 
can result in remarkable clinical responses. The ability of cancer 
cells to adapt to these agents by virtue of their genomic insta- 
bility and other resistance mechanisms eventually leads to 
disease progression in the majority of patients nonetheless. 
Unraveling the mechanisms by which cancer cells become resis- 
tant to drugs and developing new agents to target the relevant 
pathways have become logical next steps in this approach for 
cancer treatment. However, given the genetic and epigenetic 
instability of cancer cells, it is likely that each new drug or com- 
bination of drugs targeting the tumor cells will meet with more 
complex mechanisms of acquired resistance. Recent findings 
suggest that T cells, bearing antigen receptors that are gener- 
ated by random rearrangement of gene segments, followed by 
selective processes that result in a vast repertoire of T cell 
clones, provide sufficient diversity and adaptability to match 
the complexity of tumors. Discoveries regarding regulation of 
T cell responses have provided key principles regarding immune 



checkpoints that are being translated into clinical success, with 
durable responses and long-term survival greater than 10 years 
in a subset of patients with metastatic melanoma, as well as 
yielding promising results in several other tumor types. Now, 
with the perspective of combining genomically targeted agents 
and immune checkpoint therapies, we are finally poised to 
deliver curative therapies to cancer patients. To support this 
goal and accelerate these efforts, changes in directions of 
research support and funding may be required. 

Precision Medicine: Targeting the Drivers 

In the past three decades, enormous strides have been made in 
elucidating the molecular mechanisms involved in the develop- 
ment of cancer (Hanahan and Weinberg, 2011). It is now clear 
that the oncogenic process involves somatic mutations that 
result in activation of genes that are normally involved in regula- 
tion of cell division and programmed cell death, as well as inac- 
tivation of genes involved in protection against DNA damage or 
driving apoptosis (Bishop, 1 991 ; Solomon et al., 1 991 ; Weinberg, 
1991; Knudson, 2001). These genetic links led to the decision 
early in the war on cancer to undertake sequencing of cancer 
genomes to provide a comprehensive view of somatic muta- 
tional landscapes in cancer and identify possible therapeutic tar- 
gets. Infrastructure and funding were provided to coordinate the 
sequencing efforts. It has become apparent that the level of 
somatic mutations differs widely between and within different 
tumor types ranging from very low rates in childhood leukemias 
to very high rates in tumors associated with carcinogens (Alex- 
androv et al., 2013). 
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Mutations can be divided into two broad ciasses: those whose 
products “drive” tumorigenesis in a dominant fashion and “pas- 
sengers” with no obvious roie in the tumor causation. The Can- 
cer Genome Atias (TCGA) projects have enabied identification of 
many of these mutations (Chen et ai., 2014; Cancer Genome 
Atias Research Network, 2014). This has aiiowed for the rationai 
design of drugs that target and seiectiveiy interfere with onco- 
genic signaiing pathways. This approach has revoiutionized 
cancer medicine by moving away from the “one size fits 
aii” approach— for instance, traditionai chemotherapy, which 
attacks aii dividing ceiis, inciuding both cancer-differentiating 
or regenerating normai ceiis— to a more personaiized strategy 
of treating patients with a specific drug oniy if their cancer bears 
particuiar moiecuiar mutations that are target of that drug. 

As an exampie of genomicaiiy targeted therapies, an inhibitor 
against BRAF was deveioped when it was discovered that 
~40%-60% of cutaneous meianomas carry mutations in 
BRAF, which induces constitutive activation of the MARK 
pathway (Curtin et ai., 2005; Davies et ai., 2002). in a randomized 
phase iii triai comparing a BRAF inhibitor (vemurafenib) versus 
dacarbazine, the vemurafenib treatment group had a response 
rate of ~48% versus 5% in the dacarbazine arm (Chapman 
et ai., 2011). However, the median duration of response was 
short, oniy 6.7 months (Sosman et ai., 201 2). Another oncogenic 
pathway that has been targeted is the tyrosine kinase chromo- 
somai rearrangement, which resuits in the fusion oncogene 
EML4-ALK that is found in ~5% of NSCLC patients (Soda 
et ai., 2007). The EML4 fusion partner mediates iigand-indepen- 
dent oiigomerization and/or dimerization of anapiastic iym- 
phoma kinase (ALK), resuiting in constitutive kinase activity. 
Standard chemotherapies in this subgroup of patients have 
been associated with response rates of up to 10% (Hanna 
et ai., 2004). Crizotinib, a tyrosine kinase inhibitor targeting 
ALK (Kwak et ai., 2010), was shown to eiicit a response rate of 
~65% with a median duration of response of iess than 8 months 
in a phase iii triai (Shaw et ai., 201 3). Aithough there was a signif- 
icant increase in progression-free survivai for patients treated 
with crizotinib, regrettabiy, there was no overaii survivai benefit 
in the interim anaiysis. Therefore, aithough the concept of target- 
ing “driver mutations” has great merit and has demonstrated 
ciinicai responses, the reaiity remains that the majority of 
patients treated with these agents wiii derive short-term ciinicai 
responses with eventuai deveiopment of resistance mecha- 
nisms that iead to disease progression and death. 

Mechanisms operative in acquired resistance faii into three 
main categories: aiterations in the targeted gene (as a resuit of 
mutation, ampiification, or aiternative spiicing); other changes 
that do not affect the originai target but re-activate the signaiing 
pathway invoived (i.e., NRAS and MEK mutations in BRAF 
mutant meianoma); and changes that activate aiternate path- 
ways (such as activation of growth factor receptors). Consider- 
abie effort has gone into finding ways to enhance efficacy of 
genomicaiiy targeted therapies. One effort invoives muitipie 
agents that target different moiecuies in the same pathway, 
such as the combination of a BRAF inhibitor and a MEK-inhibitor 
(Larkin et ai., 201 4; Robert et ai., 201 5a). This approach heips to 
reduce compensatory feedback ioops, as weii as to biock the 
deveiopment of resistance due to mutations downstream that 



pathway. A different strategy consists of biocking paraiiei path- 
ways to prevent emerging resistance (Martz et ai., 2014). Stiii, 
the chief chaiienge of these combinatoriai approaches is the 
muitipiicity of resistance mechanisms and the fact that different 
mechanisms may be in operation in different ceiis due to intratu- 
mor heterogeneity. Given these observations, it is difficuit to 
envision reaiistic approaches to effectiveiy overcome the myriad 
of resistance mechanisms that may arise in the course of cancer 
treatment. The continued evoivabiiity of the tumor ceiis and their 
mechanisms of escape from targeted therapies raise the ques- 
tion as to whether combinations of genomicaiiy targeted agents 
wiii ever be curative. 

Advantages of Mobilizing T Cells for Cancer Therapy 

As the knowiedge of the intricate bioiogy of cancer has pro- 
gressed, so has the understanding of the fundamentai ceiiuiar 
and moiecuiar mechanisms that orchestrate the interpiay of 
the innate and adaptive arms of the immune system, in a 
simpiistic way, the innate system is composed primariiy of cyto- 
kines, the compiement system, and phagocytes such as macro- 
phages, neutrophiis, dendritic ceiis, and naturai kiiier (NK) ceiis. 
Ceiis of the innate immune system have hard-wired receptors to 
detect products of infectious microorganisms and dying ceiis. 
Macrophages and neutrophiis provide an eariy defense against 
microorganisms, whereas dendritic ceiis provide a key interface 
to the adaptive immune system, composed of B and T ceiis with 
their somaticaiiy generated, cionaiiy expressed repertoire of 
antigen receptors. 

The understanding of the basic principies governing the con- 
troiiing immunity provided the rationai for the deveiopment of 
powerfui strategies to activeiy engage the immune system for 
cancer therapy. Strategies to unieash T ceiis against tumors 
are particuiariy compeiiing, as the activity of these ceiis presents 
important features that are advantageous over other cancer 
therapies. The first is their specificity. T ceiis express antigen re- 
ceptors that recognize ceii-surface compiexes of MHC moie- 
cuies and peptides sampied from virtuaiiy aii the proteins in the 
ceii and are not iimited to peptide antigens derived from ceii- 
surface moiecuies. The second feature is memory. Primary 
T ceii responses are generaiiy foiiowed by the production of 
iong-iived memory T ceiis with acceierated kinetics of secondary 
response if the antigen recurs. Finaiiy, the T ceii response is 
adaptabie and can accommodate not oniy tumor heterogeneity 
but aiso responses to novei antigens expressed by recurring 
tumors, it has been caicuiated that the somatic recombination 
process that generates the antigen receptors of T ceiis can 
generate as many as 10^^ different receptors (Davis and Bjork- 
man, 1988). Of this theoreticai number, each individuai human 
has perhaps 10® different receptors. The immense size of the 
repertoire suggests that the immune system is indeed weii 
equipped to deai with mutabiiity and adaptabiiity of cancer. 

Harnessing T Cell Responses to Tumor Antigens 

With the advent of genomic and cDNA expression cioning 
methods and sequencing of peptides eiuted from tumor ceii 
MHC moiecuies, an avaianche of tumor antigens defined by 
tumor-specific T ceiis has been identified in both mice and in hu- 
mans. Most of these are shared between cancer ceiis of different 
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individuals and fall into four groups: products of oncogenic 
viruses (Epstein-Barr virus in certain leukemias and human 
papilloma virus in cervical and some head and neck cancers); 
antigens related to tissue-specific differentiation molecules 
(tyrosinase and related proteins in melanoma and prostate-spe- 
cific antigen and prostatic acid phosphatase in prostate cancer); 
molecules normally expressed only during fetal development 
(carcino-embryonic antigen in colon cancer, a-fetoprotein in liver 
cancer); and cancer-testes (CT) antigens, which are normally ex- 
pressed during gametogenesis but are found in many cancer 
cells as a result of changes in epigenetic regulation (MAGE and 
NY-ESO-1). 

Additionally, somatic mutations also can result in the genera- 
tion of tumor-specific peptides with the potential to bind major 
histocompatibility complex (MHC) molecules and therefore be 
recognized by the immune system as neoantigens (Sjoblom 
et al., 2006; Segal et al., 2008). The analysis of the epitope land- 
scape of breast and colon carcinoma cells revealed that the 
products of seven to ten mutant genes in colorectal and 
breast cancer, respectively, have the potential for binding to 
HLA-A*0201 alone. Because each heterozygote individual 
carries as many as 6 different HLA class I genes, this means 
an average of 42-60 potential neoantigens that can be presented 
to T cells. In support of these estimates, recent studies have 
demonstrated that neoantigens generated by somatic mutation 
are recognized by T cells in both mouse and human cancers (Lin- 
nemann et al., 2015; Gros et al., 2014; Tran et al., 2014; Gubin 
et al., 2014). 

At first, as a result of earlier studies identifying shared anti- 
gens, the field of cancer immunotherapy became focused on 
developing therapeutic vaccines to expand T cells against these 
shared antigens expressed on tumors. Many studies focused on 
stimulating T cell responses with peptides, proteins, whole- 
tumor cells including those modified to express cytokines, 
DNA, recombinant viral-based vaccines, or antigen-pulsed den- 
dritic cells given alone or in combination with various adjuvants 
or cytokines. Although these trials were conducted with the 
best available science at the time and provided promising anec- 
dotal evidence that induction of immune responses could elicit 
clinical benefit, they remained largely negative and generally 
failed to show objective clinical responses (see Rosenberg 
et al., 2004 for review). Enthusiasm waned somewhat as the 
number of failed clinical trials mounted. 

Many reasons might have contributed to the failure of these 
vaccination strategies, including choice of antigen, failure to pro- 
vide adequate costimulation, or functional inactivation of tumor- 
reactive T cells (Melero et al., 201 4). A number of T-cell-extrinsic 
suppressive mechanisms such as TGFp, FcxPS"^ regulatory 
T cells (Treg), and tryptophan metabolites (IDO) that can hamper 
anti-tumor responses have also been identified, and there have 
been efforts to minimize the suppressive effects of these in 
pre-clinical and clinical studies. 

Unraveling the Complexity of T Cell Activation 

Another contributing factor to the failure of earlier cancer vaccine 
trials was perhaps the lack of understanding and appreciation of 
the full complexity of cell-intrinsic pathways that regulate T cell 
activation. By the late 1980s, it was known that simple engage- 



ment of peptide/MHC complexes by the antigen receptor is 
insufficient for activation of T cells and may render them anergic 
(Jenkins and Schwartz, 1987; Mueller et al., 1989). In order to 
become fully activated, T cells must encounter antigen in the 
context of antigen-presenting cells (APCs) such as dendritic 
cells, which provide costimulatory signals mediated by B7 mol- 
ecules (B7-1 and B7-2) that will engage their ligand, CD28, in 
the T cell (Greenwald et al., 2005). Thus, T cells specific for a 
tumor antigen will not be activated by an initial encounter with 
tumor cells or may even be rendered anergic because, with the 
exception of a few lymphomas, tumors do not express costimu- 
latory B7 molecules (Townsend and Allison, 1993). Thus, tumors 
are essentially invisible to T cells until the T cells are activated as 
a result of cross-priming by dendritic cells that present tumor 
antigens acquired from dying tumor cells. Simultaneous recogni- 
tion of antigen/MHC complexes and costimulatory ligands by 
T cells initiates a complex set of genetic programs that result in 
cytokine production, cell-cycle progression, and production of 
anti-apoptotic factors that result in proliferation and functional 
differentiation of T cells. Consistent with the importance of 
both antigen receptor and costimulatory signals in initiating 
anti-tumor responses, many therapeutic vaccines now incorpo- 
rate both antigen and dendritic cells or agents that enhance cos- 
timulatory signaling. 

By the mid-90s, it became clear that T cell priming elicits not 
only programs leading to induction of T cell responses but also 
a parallel program that will eventually stop the response. The crit- 
ical inhibitory program is mediated by CTI_A-4, a homolog of 
CD28 that also binds B7-1 and B7-2, although with much greater 
avidity than that CD28. Expression of the ctla-4 gene is initiated 
upon T cell activation, and it traffics to and accumulates in the 
immunological synapse, eventually attenuating or preventing 
CD28 costimulation by competition for B7 binding and negative 
signaling (Walunas et al., 1994; Krummel and Allison, 1995). The 
fact that ctla-4 knockout mice suffer from a rapid and lethal 
lymphadenopathy (Waterhouse et al., 1995; Tivol et al., 1995; 
Chambers et al., 1997) speaks for a negative role for CTI_A-4 in 
limiting T cell responses to prevent damage to normal tissues. 

Thus, activation of T cells as a result of antigen receptor 
signaling and CD28 costimulation is followed not only by induc- 
tion of genetic programs leading to proliferation and functional 
differentiation but also by induction of an inhibitory program 
mediated by CTI_A-4, which will ultimately stop proliferation. 
Extrapolating this paradigm to anti-tumor T cell responses, if 
eradication of the tumor has not been completed by the time 
that the inhibitory signal of CTI_A-4 is triggered, the T cells will 
be turned off and will be unable to complete the task. Impor- 
tantly, this also suggests that, after this program is initiated, 
vaccines used to stimulate antigen receptor signaling may 
actually serve to strengthen the “off” signal as a result of addi- 
tional induction of ctla-4 expression by antigen receptor 
signaling. In any event, this suggests the importance of shifting 
strategies for cancer immunotherapy from activating T cells to 
unleashing them. 

Inactivating the Brakes to Increase Anti-tumor Immunity 

Consistent with the observations that CD28 and CTI_A-4 had 
opposing effects on T cell responses in vitro, in the late 90s, it 
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was found that, although blocking antibodies to CD28 impaired 
anti-tumor responses in mice, blocking antibodies to CTLA-4 
enhanced anti-tumor responses in mouse tumor models (Leach 
et al., 1996). In fact, the treatment of mice with anti-CTLA-4 
antibodies as monotherapy results in complete tumor rejection 
and long-lived immunity. Later on, mechanistic studies revealed 
that anti-tumor activity was associated with increased ratio of 
both CD4 and CDS effector cells to FoxPS"^ regulatory T cells 
(Quezada et al., 2006). The success of CTLA-4 blockade in these 
initial studies raised two compelling points. First, because the 
target molecule was on the T cell and not the tumor cell, it was 
feasible to imagine that the same strategy would work on 
many different histologic tumors, as well as on tumors caused 
by different genetic lesions. Second, taking into consideration 
that CTLA-4 inhibited CD28-mediated costimulation by a cell- 
intrinsic mechanism (Peggs et al., 2009), its blockade could allow 
for enhanced T cell costimulation, which in turn would increase 
the efficacy of tumor vaccines, as well as agents that kill tumor 
cells under conditions that promote inflammatory responses. 
These possibilities were further supported by the results of a 
series of studies in different mouse models, including the 
demonstration that blockade of CTLA-4 was not limited to any 
particular tumor type but was rather broadly effective. CTLA-4 
also was able to synergize with a vaccine consisting of tumor 
cells engineered to express the cytokine GM-CSF to eradicate 
tumors (Hurwitz et al., 1998; van Elsas et al., 1999). Finally, 
CTLA-4 could be combined with local delivery of irradiation, 
cryoablation, or an oncolytic virus to induce systemic tumor im- 
munity and eradication of distant metastases (Zamarin et al., 
2014; Waitz et al., 2012; Tang et al., 2014). These preclinical 
studies supported the development of clinical anti-CTLA-4 
therapy. 

Immune Checkpoint Therapy: The Clinical Success 

CTLA-4 blockade was translated to the clinic with a fully human 
antibody to human CTLA-4 (ipilimumab, Medarex, Bristol-Myers 
Squibb). Tumor regression was observed in phase I/ll trials in 
patients with a variety of tumor types, including melanoma, renal 
cell carcinoma, prostate cancer, urothelial carcinoma, and 
ovarian cancer (Yang et al., 2007; Hodi et al., 2008; Carthon 
et al., 201 0; van den Eertwegh et al., 201 2). Two phase III clinical 
trials with ipilimumab were recently completed in prostate can- 
cer, the first in patients with castrate-resistant prostate cancer 
who had not received prior chemotherapy treatment and the 
second in a more advanced disease setting, in which patients 
with castrate-resistant prostate cancer presented disease that 
had progressed on chemotherapy treatment. The former trial is 
yet to be reported. The latter trial reports the lack of statistical 
significance (p value of 0.053) to indicate a survival benefit for 
patients who received ipilimumab treatment. However, subset 
analyses indicate that patients who have favorable clinical char- 
acteristics such as lack of liver metastases do benefit from ipili- 
mumab therapy (Kwon et al., 2014). Two phase III clinical trials 
with anti-CTLA-4 (ipilimumab) were also conducted in patients 
with advanced melanoma and demonstrated improved overall 
survival for patients treated with ipilimumab (Hodi et al., 2010; 
Robert et al., 2011). Importantly, these trials indicate long-term 
durable responses with greater than 20% of treated patients 



living for more than 4 years, including a recent analysis indicating 
survival of 1 0 years or more for a subset of patients (Schadendorf 
et al., 2015). The FDA approved ipilimumab as treatment for 
patients with melanoma in 201 1 . 

The clinical success of anti-CTLA-4 opened a new field termed 
“immune checkpoint therapy” as additional T cell intrinsic path- 
ways were identified and targeted for clinical development 
(Sharma et al., 2011; Pardoll, 2012). Another T-cell-intrinsic 
inhibitory pathway identified after CTLA-4 was that mediated 
by PD-1 (programmed death 1) and its ligand PD-L1. PD-1 was 
initially cloned in 1992 in a study of molecules involved in nega- 
tive selection of T cells by programed cell death in the thymus 
(Ishida et al., 1992). Its function as an immune checkpoint was 
not established until 2000 upon identification of its ligands 
(Freeman et al., 2000). PD-L1 was then shown to protect tumor 
cells by inducing T cell apoptosis (Dong et al., 2002). Later, 
preclinical studies in animal models evaluated anti-PD-1 and 
anti-PD-LI antibodies as immune checkpoint therapies to treat 
tumors (Keir et al., 2008). 

Much like CTLA-4, PD-1 is expressed only in activated T cells. 
However, unlike CTLA-4, PD-1 inhibits T cell responses by inter- 
fering with T cell receptor signaling as opposed to outcompeting 
CD28 for binding to B7. PD-1 also has two ligands, PD-L1 and 
PD-L2. PD-L2 is predominantly expressed on APCs, whereas 
PD-L1 can be expressed on many cell types, including cells 
comprising the immune system, epithelial cells, and endothelial 
cells. Antibodies targeting PD-L1 have shown clinical responses 
in multiple tumor types, including melanoma, renal cell carci- 
noma, non-small-cell lung cancer (Brahmer et al., 2012), and 
bladder cancer (Powles et al., 2014). Similarly, phase I clinical 
trials with a monoclonal antibody against PD-1 demonstrated 
clinical responses in multiple tumor types, including melanoma, 
renal cell carcinoma, non-small-cell carcinoma (Topalian et al., 
2012), Hodgkin’s lymphoma (Ansell et al., 2015), and head and 
neck cancers (Seiwert et al., 2014, J. Clin. Oncol., abstract). 
Recently, a large phase I clinical trial with an anti-PD-1 antibody 
known as MK-3475 showed response rates of ~37%-38% in 
patients with advanced melanoma, including patients who had 
progressive disease after prior ipilimumab treatment (Hamid 
et al., 2013), triggering the approval of MK-3475 (pembroluzi- 
mab, Merck) by the FDA in September 2014. A phase III clinical 
trial that treated patients with metastatic melanoma with a 
different anti-PD-1 antibody (nivolumab, Bristol-Myers Squibb, 
BMS) also demonstrated improved responses and overall sur- 
vival benefit as compared to chemotherapy treatment (Robert 
et al., 2015b). Nivolumab was FDA approved for patients with 
metastatic melanoma in December 2014. In addition, nivolumab 
was FDA approved in March 2015 for patients with previously 
treated advanced or metastatic non-small-cell lung cancer 
based on a phase III clinical trial, which reported an improvement 
in overall survival for patients treated with nivolumab as 
compared to patients treated with docetaxel chemotherapy. 

Because CTLA-4 and PD-1 regulate different inhibitory path- 
ways on T cells, combination therapy with antibodies targeting 
both molecules was tested and found to improve anti-tumor re- 
sponses in a pre-clinical murine model (Curran et al., 2010). A 
recently reported phase I clinical trial with anti-CTLA-4 in combi- 
nation with anti-PD-1 also demonstrated tumor regression 
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in ~50% of treated patients with advanced melanoma, in most 
cases with tumor regression of 80% or higher (Wolchok et al., 
2013). There are ongoing clinical trials with anti-CTLA-4 
(ipilimumab, BMS or tremelimumab, Medimmune/Astrazeneca) 
plus anti-PD-1 or anti-PD-LI in other tumor types, with prelimi- 
nary data indicating promising results (Hammers et al., 2014, 
J. Clin. Oncol., abstract; Callahan et al., 2014, J. Clin. Oncol., 
abstract) that highlight this combination as an effective immuno- 
therapy strategy for cancer patients. 

As with other cancer therapies, immune checkpoint therapies 
may lead to side effects and toxicities (see Postow et al., 2015; 
Gao et al., 2015 for recent reviews). Briefly, these side effects 
consist of immune-related adverse events that are defined by in- 
flammatory conditions, including dermatitis, colitis, hepatitis, 
pancreatitis, pneumonitis, and hypophysitis. These side effects 
can be managed and usually involve administration of immuno- 
suppressive agents such as corticosteroids, which do not 
appear to interfere with clinical benefit that is derived from 
the immune checkpoint agents. The profile of side effects that 
occur with both anti-CTLA-4 and anti-PD-1 /PD-L1 antibodies 
is similar; however, the side effects appear to occur more 
frequently in the setting of anti-CTLA-4 therapy as compared 
to anti-PD-1 and anti-PD-LI therapies. The continued success 
of immune checkpoint therapies in the clinic will require educa- 
tion of the oncology community regarding recognition and treat- 
ment of the side effects elicited by these agents. 

Novel Immunologic Targets for Cancer Immunotherapy 

Although blockade of the CTLA-4 and PD-1/PD-L1 pathways is 
furthest along in clinical development, it only represents the tip 
of the iceberg in the realm of potential targets that can serve to 
improve anti-tumor responses. Ongoing studies on regulation 
of immune responses have led to the identification of multiple 
other immunologic pathways that may be targeted for the devel- 
opment of therapies, either as monotherapy or in combination 
strategies, for the successful treatment of cancer patients. These 
include immune checkpoints or inhibitory pathways, as well as 
co-stimulatory molecules, which act to enhance immune re- 
sponses. A partial list of new immune checkpoints that are being 
evaluated in pre-clinical tumor models and/or in the clinic with 
cancer patients includes LAG-3 (Triebel et al., 1990), TIM-3 
(Sakuishi et al., 2010), and VISTA (Wang et al., 2011), whereas 
co-stimulatory molecules include ICOS (Fan et al., 2014), 0X40 
(Curti et al., 2013), and 4-1 BB (Melero et al., 1997). 

Of these emerging immune checkpoints, LAG-3 is the furthest 
along in clinical development with a fusion protein (IMP321, 
Immuntep) and an antibody (BMS-98601 6, BMS) in clinical trials. 
The fusion protein was tested as monotherapy in patients with 
renal cell carcinoma, which was well tolerated and led to stabili- 
zation of disease in some patients (Brignone et al., 2009). 
IMP321 was also tested in combination with paclitaxel chemo- 
therapy in patients with metastatic breast cancer, which led 
to an objective response rate of 50% (Brignone et al., 2010). 
Based on these promising results, a phase III clinical trial is 
expected to begin accrual in 2015. Other clinical trials are 
ongoing with an antibody against LAG-3 (BMS-98601 6), which 
is also being tested in combination with anti-PD-1 (nivolumab) 
(NCT01968109, http://www.clinicaltrials.gov). TIM-3 is another 



immune checkpoint for which agents are being developed for 
clinical testing. Pre-clinical studies indicate that TIM-3 is co- 
expressed with PD-1 on tumor-infiltrating lymphocytes, and 
combination therapy targeting these two pathways improves 
anti-tumor immune responses (Sakuishi et al., 2010). Finally, an 
antibody targeting VISTA was recently shown to improve anti- 
tumor immune responses in mice (Le Mercier et al., 2014), with 
clinical development soon to follow. Again, these agents repre- 
sent only a partial list of the immune checkpoint agents that 
are currently under development for clinical testing, with expec- 
tations that they will be tested in combination strategies based 
on in-depth analyses of human tumors to provide an understand- 
ing of co-expression of these, and other immunologic targets, to 
guide rational combinations. 

Regarding the co-stimulatory molecules, 0X40 and 41 BB, 
which are members of the TNF-receptor superfamily, are 
furthest along in clinical development. A murine anti-OX40 anti- 
body, given as a single dose, was tested in a phase I clinical trial 
and found to have an acceptable safety profile, as well as evi- 
dence of anti-tumor responses in a subset of patients (Curti 
et al., 2013). Humanized antibodies against 0X40 are expected 
to enter clinical trial in 2015. Anti-41 BB (BMS-663513) is a fully 
humanized monoclonal antibody that has been tested in a phase 
I/ll study in patients with melanoma, renal cell carcinoma, and 
ovarian cancer, with promising clinical responses, as well as 
toxicities, especially at higher doses, which led to re-evaluation 
of the dose and schedule of treatment (Sznol et al., 2008, 
J. Clin. Oncol., abstract). Currently, there are five clinical trials 
with anti-41 BB (urelumab, BMS-663513) that are recruiting pa- 
tients with various tumor types (http://www.clinicaltrials.gov), 
including combination with anti-PD-1 (nivolumab), with data ex- 
pected to be presented from these trials during the next 1 to 2 
years. The third co-stimulatory molecule is inducible co-stimu- 
lator (ICOS), a member of the CD28/B7 family whose expression 
increases on T cells upon T cell activation. ICOS^ effector T cells 
(Teff), as opposed to IOCS'" regulatory T cells (Treg), increase 
after patients receive treatment with anti-CTLA-4 (Liakou et al., 
2008), correlating with clinical benefit in a small retrospective 
study (Carthon et al., 201 0). ICOS thus may serves as a pharma- 
codynamic biomarker to indicate that anti-CTLA-4 has “hit its 
target” enhancing T cell activation (Ng Tang et al., 2013). Also, 
the association of agonistic targeting of ICOS and blockade of 
CTLA-4 can lead to improved anti-tumor immune responses 
and tumor rejection in mice (Fan et al., 2014). Anti-ICOS anti- 
bodies are expected to enter into clinical trials in 201 5. It is likely 
that combination therapy to simultaneously engage co-stimula- 
tory pathways and limit inhibitory pathways will be a successful 
path forward to provide clinical benefit. Importantly, based on 
the profile of toxicities observed to date, it will be critical 
to closely monitor these combination strategies for potential 
adjustments of dosage and management of toxicities that may 
arise. 

Reconciliation: Curative Therapeutic Combinations 

The last few decades have witnessed the emergence of two 
effective but fundamentally different strategies for cancer ther- 
apy, each with its own strengths and weaknesses. Genomic- 
guided identification of mutations that drive cancer has led to 
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Figure 1 . Combination Therapy May Improve Anti-tumor Responses 

Depiction of tumor cells dying as a result of genomically targeted therapies 
with release of tumor antigens; tumor antigens are taken up by APCs and are 
presented in the context of B7 costimulatory molecules to T cells; T cells 
recognize antigens on APCs to become activated; activated T cells also up- 
regulate inhibitory checkpoints such as CTLA-4 and PD-1 ; immune checkpoint 
therapy prevents attenuation of T cell responses, thereby allowing T cells to kill 
tumor cells; and T cells may differentiate into memory T cells that can re- 
activate in the presence of recurrent tumor. 



the development of drugs that result in remarkable responses in 
the majority of patients whose tumors have the targeted lesion, 
but the responses are relatively short-lived. As was the case 
with chemotherapies, it is not unreasonable that combinations 
of genomically targeted agents will be more powerful against 
cancer than single agents. It is possible that the use of multiple 
agents may enhance their effectiveness in terms of increasing 
overall survival. However, the myriad of mechanisms of acquired 
resistance and the complexity of the target landscape due to 
inherent genomic instability may prove extremely difficult to 
overcome through the sole use of genomically targeted strate- 
gies, attaining to achieve cure. In contrast, immune checkpoint 
therapy is inherently multivalent because targeting a single 
checkpoint can potentially release T cells with specificity for 
peptides derived from many different antigens present in a 
tumor, including differentiation, cancer testis, and even neoanti- 
gens generated by mutational events inherent in the genomic 
instability that drives cancer (Snyder et al., 2014; Linnemann 
et al., 2015). As a result of the generation of improved anti-tumor 
T cell responses, immune checkpoint therapy results in durable 
responses but only in a fraction of patients. As discussed in the 
previous sections, it is certainly possible to target multiple 
immune checkpoints with different mechanisms for improved 
anti-tumor responses in greater numbers of patients. Will pa- 
tients benefit from combination of these two strategies? 

Efforts to combine molecularly targeted agents and immuno- 
therapy have already begun. A phase I clinical trial with agents 
that inhibit receptor tyrosine kinases, sunitinib, or pazopbnib, 
in combination with anti-PD-1, was recently reported and 
showed promising overall response rates of 40%-50% in pa- 
tients with metastatic renal cell carcinoma (RCC) (Amin et al., 
2014, J. Clin. Oncol., abstract). These types of combinations 
will require further follow-up to evaluate for survival and durability 
of responses. An area that has not yet received enough attention 
is the immunological impact of genetically targeted agents. 
Vemurafenib, an FDA-approved BRAF inhibitor used for the 
treatment of melanoma, has been shown to increase expression 
of tumor antigens and MHC molecules (Frederick et al., 2013), 
increasing the sensitivity of the tumor cells to immune attack. 
Vemurafenib also has potent effects on T cells, enhancing the 
effects of antigen-mediated activation, perhaps as a result of 
enhanced activation of the MAP kinase pathway after T cell 
antigen receptor signaling (Atefi et al., 2014). These data sug- 
gest that certain agents may be well suited for combination 
with immunotherapy. However, a clinical trial testing a BRAF in- 
hibitor (vemurafenib) in combination with anti-CTI_A-4 (ipilimu- 
mab) was terminated due to hepatotoxicity (Ribas et al., 2013). 
A second clinical trial with a BRAF inhibitor (dabrafenib) in com- 
bination with anti-CTI_A-4 (ipilimumab) is currently ongoing, and 
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Figure 2. Improved Overall Survival as a Result of Combination 
Therapy 

Depiction of Kapian-Meier survivai curve with genomicaiiy targeted agents 
(biue iine) as compared to standard therapies (purpie iine), indicating an 
improvement in median overaii survivai but iack of durabie responses; 
improved median overaii survivai and durabie responses in a fraction of 
patients treated with immune checkpoint therapy (green iine); possibiiity for 
improved median overaii survivai with durabie responses for the majority of 
patients in the setting of combination treatment with genomicaiiy targeted 
agents and immune checkpoint therapy (red iine). 

preliminary data indicate that this combination appears to be 
well tolerated (Puzanov et al., 2014, J. Clin. Oncol., abstract), 
which highlights the need to consider differences in drugs, 
dose, and/or schedule when evaluating agents for combination 
strategies. Understanding how different genetically targeted 
agents affect the responsiveness to immunotherapy may help 
guide choices of combinations of drugs. 

From a mechanistic perspective, it is possible that combina- 
tion strategies with immune checkpoint therapies and genomi- 
cally targeted agents will result in induction of immune memory, 
leading to more durable control of tumor growth than what is 
achievable with either modality alone. Genomically targeted 
therapies with high objective response rates actually could serve 
as “cancer vaccines,” inducing the killing of tumor cells and re- 
sulting in the release of tumor antigens and neoantigens, which 
can then be presented by APCs to tumor-specific T cells 
(Figure 1). These T cells would become activated but also upre- 
gulate inhibitory checkpoints such as CTLA-4 and PD-1 , which 
can be blocked with antibodies to permit enhanced anti-tumor 
T cell responses, including memory T cell responses, to enable 
long-term control of disease and possible cure. In addition, the 
use of targeted agents to directly kill tumor cells, with release 
of tumor antigens, may focus the activated immune response 
generated by immunotherapy agents on tumor antigens rather 
than self-antigens expressed on normal tissues, resulting in 
fewer adverse events. Furthermore, identification of neoantigens 
may result in the development of personalized vaccines 
composed of these neoantigens for novel vaccine strategies 
plus immune checkpoint agents (Gubin et al., 2014; Tran et al., 
2014; Linnemann et al., 2015). 

Although it is clear that clinical responses can be elicited with 
immune checkpoint therapies or genomically targeted agents, it 



appears that genomically targeted agents alone tend to improve 
median survival without providing long-term durable responses 
(Figure 2, blue line). Targeting immune checkpoints improves 
median survival but remarkably also provides long-term durable 
responses, raising the tail of the survival curve (Figure 2, green 
line). When combined, these therapies are likely to have an addi- 
tive or even synergistic therapeutic effect that not only would 
potentially further improve median survival but would also raise 
the tail of the survival curve, increasing the number of patients 
that appreciate long-term clinical benefit (Figure 2, red line). 

A Future of Curative Cancer Therapies 

Federal funding for research has been overwhelmingly directed 
toward genomically targeted therapies as compared to immune 
checkpoint therapies. The fundamental research that led to the 
identification of CTI_A-4 as an immune checkpoint, as well as 
the pre-clinical studies showing the potential of its blockade in 
cancer therapy, were funded by the National Cancer Institute, 
but since then, there have been no major initiatives to accelerate 
progress in this area. Given the durability of the responses that 
have been obtained with immune checkpoint therapies, it seems 
reasonable also to allocate enough funds and resources to 
research focused on immune checkpoint therapies and combi- 
nation therapy of genomically targeted agents and immuno- 
therapy with promising curative potential. Efforts to determine 
the impact of genomically targeted therapies on the immune sys- 
tem should also be prioritized, as they will help to identify which 
agents can enhance anti-tumor T cell responses and guide the 
choice of combinations from the two classes of agents. At this 
stage, it does not seem a stretch to say that increasing funding 
to combination therapies will be key to development of new 
safe treatments that may prove to be curative for many patients 
with many types of cancer. 
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In Brief 

Even when a stimulus invariably activates 
a sensory neuron, the motor output/ 
behavior is probabilistic because of 
variability in the network state of 
interneurons downstream of the sensory 
neuron. Manipulating the activity of these 
interneurons relative to one another can 
drive olfactory responses in C. elegans to 
be more deterministic. 



Highlights 

• Interneurons in an olfactory circuit have variable responses 
to a fixed odor input 

• Interneurons participate in collective network states that 
correlate with behavior 

• Reliability of the AIB interneuron’s odor response depends 
on the network activity state 

• Chemical synapses from the RIM interneuron increase 
variability of the odor response 
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SUMMARY 

Variability is a prominent feature of behavior and is an 
active element of certain behavioral strategies. To 
understand how neuronal circuits control variability, 
we examined the propagation of sensory information 
in a chemotaxis circuit of C. elegans where discrete 
sensory inputs can drive a probabilistic behavioral 
response. Olfactory neurons respond to odor stimuli 
with rapid and reliable changes in activity, but down- 
stream AIB interneurons respond with a probabilistic 
delay. The interneuron response to odor depends on 
the collective activity of multiple neurons — AIB, RIM, 
and AVA — when the odor stimulus arrives. Certain 
activity states of the network correlate with reliable 
responses to odor stimuli. Artificially generating 
these activity states by modifying neuronal activity 
increases the reliability of odor responses in inter- 
neurons and the reliability of the behavioral response 
to odor. The integration of sensory information with 
network states may represent a general mechanism 
for generating variability in behavior. 

INTRODUCTION 

Variability is intrinsic to behavior. The behavioral response of 
individuals to a defined sensory stimulus varies from trial to trial, 
even when it is predictable on average. Although variability may 
limit task performance, both behavioral and theoretical analyses 
suggest that it can also be a creative element of behavioral stra- 
tegies (Thrun, 1992; Messier and Doupe, 1999; Olveczky et al., 
2005; Turner and Brainard, 2007; Chaisanguanthum et al., 
2014). In foraging animals, behavioral variation over short and 
long timescales allows efficient exploration of environments 
with unevenly distributed resources (Charnov, 1976; Humphries 
et al., 2010). In an analogous fashion, computer machine- 
learning algorithms use variability to escape local minima and 
reach global optima (Kirkpatrick et al., 1983; Mitsutake et al., 
2013). Game-theoretical approaches suggest that variable stra- 
tegies are often the best responses to unpredictable conditions, 
particularly in the presence of competitors or predictors (Harsa- 
nyi, 1973). At a neuronal level, intrinsically generated variability 
provides a substrate for reward learning, and increased vari- 
ability has been linked to enhanced learning in motor tasks 



(Olveczky et al., 2005; Turner and Brainard, 2007; Chaisanguan- 
thum et al., 2014). 

Trial-to-trial variability in responses to a sensory stimulus can 
result from several mechanisms. There is unavoidable noise in 
sensory systems operating near their detection or discrimination 
thresholds (Barlow et al., 1971; Lillywhite and Laughlin, 1979; 
Bialek, 1987). This stochastic noise decreases precision, but it 
can enhance sensitivity to weak signals (Benzi et al., 1 981 ; Long- 
tin et al., 1 991 ). At subsequent levels, noise in synaptic transmis- 
sion or cellular properties can alter signal propagation at any 
point between sensory and motor systems. Finally, the state of 
the neuronal network when a signal arrives can influence the 
network response, especially if its dynamics are highly sensitive 
to initial conditions (Rajan et al., 201 0). However, it is challenging 
to ascribe single-trial variation to a precise source in complex 
systems in which the neuronal source of behavioral variation 
must be indirectly inferred from population measurements of 
neuronal activity. 

The compact nervous system of the nematode worm 
Caenorhabditis elegans, which has only 302 neurons and about 
7,000 connections (White et al., 1986), provides an opportunity 
to address the neuronal sources of behavioral variability. Vari- 
ability is an explicit element of C. elegans behavioral strategies 
for locating attractants. As first described in bacteria, a biased 
random walk allows organisms to approach an attractant 
source by changing their turning rates on the basis of whether 
stimulus concentrations are increasing or decreasing (Berg and 
Brown, 1972). In this probabilistic behavior, the rate of turning 
is predictable, but individual reorientation events are not. 
C. elegans has probabilistic reversal (reorientation) responses 
to odors, tastes, and temperature associated with chemotaxis 
and thermotaxis behaviors (Pierce-Shimomura et al., 1999; 
Clark et al., 2007). The sensory neurons and circuits for these 
behaviors have been extensively characterized, but it is not 
known where in the circuit a decision is made to reorient 
movement. 

C. elegans neurons fall into three computational levels: sen- 
sory neurons that gather information, motor neurons that syn- 
apse onto muscle, and extensively interconnected interneu- 
rons. C. elegans chemotaxis to attractive odors such as 
isoamyl alcohol (lAA) is initiated by two AWC olfactory neurons. 
Attractive odors decrease AWC calcium levels and suppress 
reversal behaviors as part of a biased random walk strategy, 
whereas odor removal increases AWC calcium and stimulates 
reversals (Chalasani et al., 2007; Albrecht and Bargmann, 
2011). The AWC calcium response, which is likely correlated 
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Figure 1. Calcium Dynamics of AIB, RIM, 
and AVA Neurons in Response to Odor 

(A) Simplified wiring diagram showing AWC sen- 
sory neurons and three interneurons in a circuit 
linking AWC to reversal behavior, and the number 
of direct synapses between each pair of neurons 
(White et al., 1986). A more complete circuit ap- 
pears in Figure 7. 

(B) Light-induced reversal behaviors in wild-type 
animals expressing Channelrhodopsin2 (ChR2) in 
specific neurons, showing the instantaneous 
fraction of animals reversing; the total percentage 
of animals that respond is higher (Figure SI A). The 
low light levels used here (0.025 mW/mm^) did not 
activate the endogenous C. elegans light avoid- 
ance response (black line). 

(C) Single frame showing AIB, RIM, and AVA neu- 
rons expressing GCaMPS in an animal restrained 
in the microfluidic imaging chip, and schematic 
showing the head of the animal and location of 
neurons and processes. Dashed lines represent 
processes on the contralateral side of the head; the 
AIB, RIM, and AVA neurons from the contralateral 
side are not shown. Scale bar, 10 lam. 

(D) Averaged calcium responses to a 1-min expo- 
sure to 92 i^M lAA (top) and individual traces 
(bottom). AIB, RIM, and AVA were recorded 
simultaneously and aligned in the same order in 
each panel (n = 83); AWC was recorded indepen- 
dently (n = 35). Calcium dynamics were normalized 
to peak and trough values for each trace before 
averaging. Shaded area is SEM. 
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with depolarization, is highly reliable from trial to trial, even after 
dozens of odor presentations (Larsch et al., 2013). By contrast, 
the reversal response is probabilistic. Even under well-con- 
trolled conditions, animals may or may not reverse on individual 
trials, regardless of the strength of the AWC calcium response 
(Larsch et al., 201 3). Most reversals rely upon the two AVA com- 
mand interneurons, which synapse onto motor neurons that 
control the final common pathway for transitions from forward 
to backward movement (Chalfie et al., 1985). An increase in 
AVA calcium activity consistently correlates with the beginning 
of a reversal, and a decrease with its termination (Chronis 
et al., 2007; Ben Arous et al., 2010; Faumont et al., 2011). 
Thus, the variability in the behavioral response results from var- 
iable transmission of information from the AWC sensory neuron 
to AVA command neurons. 

The C. elegans connectome provides a framework to examine 
the intermediate steps of information propagation at the single- 
neuron level. Here we show that the AIB and RIM interneurons 
that link AWC to AVA integrate sensory information with ongoing 
network states to produce neuronal and behavioral variability. 
We find that the instantaneous activity state of the integrating 



Time (sec) network can predict its response to odor 

stimuli and that artificially generating 
certain activity states by modifying RIM 
activity can increase the reliability of 
AVA responses and behavior. Internal 
network states may serve as a source of 
variability that influences the neuronal response to sensory input 
and ultimately the animal’s response to its environment. 

RESULTS 

Odor-Evoked Calcium Responses in AIB, RIM, and AVA 

Among several neuronal pathways connecting AWC to motor 
output, four pairs of neurons— AWC, AIB, RIM, and AVA— repre- 
sent a starting point for defining a connectivity diagram for rever- 
sals (White et al., 1986; Figure 1 A). The C. e/egans wiring diagram 
predicts a single direct synaptic connection between AWC sen- 
sory neurons and AVA backward command neurons, but a much 
greater number of indirect connections. Most reversals initiated 
by AWC require the two AIB interneurons, which are major syn- 
aptic targets of AWC and many other sensory neurons (Gray 
et al., 2005; White et al., 1986). AIB has a few direct synapses 
onto AVA and a much stronger indirect connection to AVA 
through the two RIM interneurons, which are connected to 
both AIB and AVA by chemical and electrical synapses. RIM neu- 
rons also form neuromuscular junctions that affect head move- 
ments, which were not studied here. 
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AWC, AIB, and AVA stimulate odor-evoked and spontaneous 
reversals (Gray et al., 2005; Chalasani et al., 2007; Guo et al., 
2009), but experiments conducted in different conditions and 
genetic backgrounds have led to ambiguous conclusions about 
whether RIM stimulates or inhibits reversals (Gray et al., 2005; 
Guo et al., 2009; Piggott et al., 201 1). To clarify this relationship, 
we depolarized each neuron type individually in wild-type ani- 
mals by cell-specific expression and activation of Channelrho- 
dopsin2 (Nagel et al., 2005). Acute light stimulation of AWC, 
AIB, RIM, or AVA resulted in increased reversal behaviors (Fig- 
ure 1 B; Figure SI ), suggesting that the net activity of each neuron 
promotes reversals. 

To monitor information flow between these neurons, we 
imaged calcium in animals expressing the genetically encoded 
calcium indicator GCaMPS (Tian et al., 2009) in AWC, AIB, 
RIM, and AVA individually and in combinations. Animals were 
restrained in a small microfluidic chamber that allowed precise 
delivery and removal of odor stimuli to the nose (Figures 1C 
and SI) (Chronis et al., 2007). AWC and AIB have previously 
been examined in this imaging system (Chalasani et al., 2007), 
but RIM and AVA have not. AWC responds to odor addition 
with an immediate reduction in calcium and to odor removal 
with a sharp calcium rise followed by a return to baseline levels 
(Figure ID). AIB, RIM, and AVA responded to odor addition 
with a calcium decrease relative to the average baseline, and 
to odor removal with a slow return toward the baseline, without 
an overshoot (Figure 1 D). The interneuron calcium responses ap- 
peared smaller in magnitude and slower than those of AWC neu- 
rons. We focused subsequent analysis on odor addition, in part 
because of its robustness and in part because the more complex 
response to odor removal is regulated by odor history as well as 
concentration (Chalasani et al., 2007). 

AIB, RIM, and AVA Have Distinct High and Low Activity 
States 

Examination of individual traces revealed an unexpected feature 
of calcium signals in AIB, RIM, and AVA not visualized in the av- 
erages: the neurons appeared to switch between long-lasting 
high and low calcium states, spending little time at intermediate 
values (Figures ID and 2A-2C). Furthermore, transitions be- 
tween high and low states for all three neurons often occurred 
at the same time, both spontaneously in buffer (Figure 2A) and 
in response to odor stimuli (Figure 2C). Quantitative analysis 
confirmed that AIB, RIM, and AVA calcium signals had a bimodal 
distribution, with a strong bias toward distinct high and low 
states (Figure 2B). While C. elegans neurons lack classical so- 
dium-based action potentials, they do have voltage-activated 
channels that can generate active properties such as bistability 
(Goodman et al., 1998, Mellem et al., 2008). 

Calcium signals in C. elegans neurons are generally correlated 
with depolarization, but can vary between cellular compartments 
(Chalasani et al., 2007; Hendricks et al., 2012). The presynaptic 
calcium that drives neurotransmitter release is most relevant to 
neuronal function and can be observed directly by monitoring 
GCaMP signals in axons. Like somatic responses, calcium sig- 
nals in the axons of AIB, RIM, and AVA neurons were bimodal, 
with long-lasting high and low states (Figure S2A). They began 
to rise or fall at the same time as somatic calcium signals, but 



the response magnitude peaked more quickly in the axon, espe- 
cially for RIM (Figure 2D). Odor-evoked activity in axons rose and 
fell with similar dynamics in AIB, RIM, and AVA (Figure 2D). 
Because of the proximity of AIB, RIM, and AVA axons, simulta- 
neous imaging was only possible for cell bodies. 

We defined distinct ON and OFF states for AIB, RIM, and AVA 
on the basis of the beginning of the rise or fall in activity, which 
was synchronous in cell bodies and axons (Figure S2B). Both 
ON and OFF states varied greatly in duration, with lengths that 
ranged from a few seconds to several minutes in animals held 
in constant conditions (Figure S2C). 

Correlated activity among a set of neurons that includes AIB 
and AVA has also been observed in whole-brain imaging (Schro- 
del et al., 2013; Prevedel et al., 2014), in agreement with these 
observations. To assess the degree of correlation between the 
interneurons, AIB, RIM, and AVA were simultaneously imaged 
and then independently classified into ON or OFF states for an- 
imals imaged for 1 min in buffer (Figures S2 and S3). Most 
neuronal state transitions were correlated to within a few sec- 
onds, which is at the limit of resolution of the binary classification 
scheme. AVA produced the clearest transitions between ON and 
OFF states, whereas the slower calcium dynamics of RIM led 
to less precise transition assignments. AIB had relatively fast 
calcium dynamics and also displayed more low-amplitude high 
frequency (second, subsecond) dynamics in the ON state that 
made shorter ON/OFF assignments less precise (Figures 2A 
and 2C; Figure S3). 

Of the eight possible binary states of AIB, RIM, and AVA, only 
three occurred for time frames longer than 10 s: (1) all neurons 
ON, (2) all neurons OFF, or (3) AIB only (AIB-ON, AVA/RIM- 
OFF) (Figure 2E; Figure S2D). The AIB-only state could last for 
many seconds when it did occur (Figure S2D). Mutual informa- 
tion analysis confirmed that AVA and RIM are more tightly 
coupled to one another than to AIB (Experimental Procedures; 
Figure S3). Transitions between the three network states 
continued over several hours in animals that were physically im- 
mobilized in the absence of externally applied sensory stimuli or 
pharmacological agents. 

Odor Addition Drives State Transitions in AIB, RIM, and 
AVA Interneurons 

lAA addition induces a rapid and reliable suppression of activity 
in AWC sensory neurons (Figure 1 D; Chalasani et al., 2007). Odor 
effects on interneurons were more variable and were most easily 
understood in the context of the distinct ON and OFF network 
states. 

We first considered each interneuron independently. The 
average intermediate response to odor addition (Figure ID) 
resulted from some neurons that responded very strongly soon 
after odor addition and others that did not respond at all. This ef- 
fect was most evident when neuronal activity profiles were 
binned according to their state just prior to odor addition and 
sorted on the basis of the time to the next transition (Figure 3A). 
A majority of the AIB, RIM, and AVA neurons in the ON state re- 
sponded to odor with a transition to the OFF state within a few 
seconds, and a minority did not. Neurons in the OFF state re- 
mained OFF after odor addition, presumably because they could 
not be suppressed below this apparent baseline. To quantify the 
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Figure 2. AIB, RIM, and AVA Have Bistable, 
Correlated Activity States 

(A) Representative calcium dynamics in simulta- 
neously recorded AIB, RIM, and AVA neurons from 
three animals in buffer. 

(B) Probability distribution of normalized calcium 
activity showing bimodal distribution of activity in 
AIB, RIM, and AVA neurons in buffer (n = 83). 

(C) Representative calcium dynamics in simulta- 
neously recorded AIB, RIM, and AVA neurons from 
three animals in response to a 1-min odor pulse 
(gray bars). 

(D) Calcium dynamics in simultaneously imaged 
cell bodies (color) and processes (gray), aligned to 
ON-OFF and OFF ^ ON transitions (n = 17). 
Shaded areas are SEM. Calcium dynamics from 
individually imaged neurons are normalized to 
peak and trough values for each trace in all panels. 

(E) The eight possible network states and the 
observed frequencies of each state lasting longer 
than 10 s during a 1-min period prior to odor 
exposure (n = 83). 
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effect of odor addition, we measured the time after odor expo- 
sure at which each interneuron switched from an ON to OFF 
state and compared with controls switched from buffer to buffer 
(Figures 3B-3D). Odor shifted the entire distribution of AIB, RIM, 
and AVA neurons toward OFF states within a few seconds, an ef- 
fect that was strongest for AIB. 

The responses of simultaneously imaged AIB, RIM, and AVA 
neurons to odor addition were highly correlated with one 
another; in most cases, either all three neurons shifted from an 
ON to an OFF state at similar times or none did (Figure 1 D; Fig- 



ure S3E). These results suggest that odor 
drives collective all-or-none transitions 
in multiple interneurons to shift network 
states. 

Odor addition had other effects on 
interneuron activity as well (Figures 3D- 
3H). For all three interneurons, the median 
duration of the initial OFF response in- 
duced by odor was 5- to 10-fold longer 
than in buffer controls (Figure 3E; Fig- 
ure S2C), and the total fraction of time 
spent in the ON state decreased 2- to 4- 
fold (Figure 3F). In addition to large-scale 
ON-OFF transitions, odor often elicited a 
rapid, small-amplitude decrease in the 
calcium signal of the AIB neuron, which 
was not always accompanied by a full 
OFF state or by similar changes in the 
RIM and AVA neurons (Figures 3G and 
3H; Figures S3C and S3D). 

The odor-regulated, inefficient, all-or- 
none transitions in neuronal activity of 
AIB, RIM, and AVA seemed of particular 
interest for probabilistic reversal behavior: 
they captured both the delay and the 
variability characteristic of the behavioral 
output (Larsch et al., 2013). These transitions transform reliable 
AWC activity into variable AVA responses. 

RIM Neurons Create Variability in AIB Odor Responses 

An insight into the source of interneuron variability was provided 
by the network state in which only AIB was ON. In AIB-only 
states, unlike all-ON states, odor addition always drove a transi- 
tion to the all-OFF state (Figures 31 and 3J). This class of events 
largely explained the higher fraction of odor response in AIB 
neurons compared with RIM and AVA. Moreover, the AIB-only 
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neurons responded more rapidly to odor than AIB neurons in 
an all-ON state (Figure 3K). The rapid and reliable AIB-only 
response to odor indicates that a network property correlated 
with RIM and AVA activity antagonizes the AIB odor response. 

To explore the functional importance of the AIB-only network 
state, we manipulated neuronal activity with a chemogenetic re- 
agent, the histamine-gated chloride channel HisCH . C. elegans 
does not use histamine as an endogenous neurotransmitter, 
but C. elegans neurons that express Drosophila HisCH are 
acutely hyperpolarized within a few minutes of exposure to exog- 
enous histamine (Pokala et al., 2014). This reagent enabled the 
silencing of specific neurons under conditions compatible with 
neuronal activity imaging. HisCH was individually expressed in 
AIB, RIM, or AVA neurons, or in pairwise combinations of these 
neurons, to generate histamine-sensitive strains. These strains 
also expressed GCaMPS in AIB, RIM, and AVA. 

Acute silencing of both AVA and RIM eliminated their calcium 
transients, recapitulating the AIB-only state (Figure 4A). In ani- 
mals in which RIM and AVA were acutely silenced with histamine, 
odor addition shifted AIB neurons from the ON to the OFF state 
with high reliability and a significantly shorter latency than con- 
trols (Figures 4B, fifth row, 4C, and 4D). A similar effect was 
observed when RIM and AVA were removed from the circuit 
using an nmr-1::ICE transgene that results in programmed cell 
death of RIM, AVA, and four other neuron classes (Zheng et al., 
1999). AIB neurons in nmr-1::ICE animals responded rapidly 
and reliably to odor addition (Figure S4), resembling wild-type 
animals in the AIB-only state. Thus, the activity of RIM and AVA 
neurons delays and diminishes odor responses in AIB. 

In addition to imitating endogenous network states, the cell- 
specific HisCH transgenes made it possible to generate alterna- 
tive network states. Thus, with appropriate transgenes it was 
possible to silence only AVA or only RIM with HisCH, although 
these states were not normally observed. Silencing either RIM 
or AVA decreased the latency and increased the efficiency of 
AIB odor responses, with RIM having a stronger effect (Figures 
4B, third and fourth rows, 4C, and 4D). These results indicate 
that AIB interneurons are subject to feedback from RIM and 
AVA neurons and that this feedback is one source of variability 
in the AIB odor response. 

Interactions between AIB, RIM, and AVA Shape Odor 
Responses and Network States 

In the All-ON state, AIB, RIM, and AVA interneurons respond to 
odors after a variable delay, if at all. Using HisCI silencing, we 
probed the effect of each of these three neurons on the odor re- 
sponses of the others, alone and in combination. 

Silencing either RIM or AVA increased the speed and reliability 
of the odor response in AVA or RIM, respectively, as well as AIB 
(Figures 4B-4D). Thus, RIM and AVA can each act independently 
to antagonize odor responses. 

Silencing AIB had little effect on the onset of odor responses 
in RIM and AVA (Figures 4C and 4D), indicating that other 
inputs such as the direct AWC to AVA synapses are sufficient 
to drive odor responses. However, silencing AIB considerably 
decreased the duration of the OFF responses induced by odor 
in RIM and AVA (Figures 4E and 4F), indicating that AIB stabilizes 
odor-induced OFF states. 



Silencing RIM decreased the correlation of activity between 
AIB and AVA neurons, suggesting that RIM has a key role in syn- 
chronizing network states (Figure 4H). Other interactions in the 
circuit were suggested by pairwise silencing. For example, 
simultaneous silencing of AIB and RIM led to a striking reduction 
in spontaneous AVA activity (Figure 4B, seventh row). This result 
suggests that both AIB and RIM provide excitatory drive to the 
backward command system. None of the manipulations of 
neuronal activity changed the bistable, switchlike behavior of 
the AIB, RIM, and AVA neurons. 

Chemical Synapses Mediate the Antagonistic Effects of 
AIB and RIM 

AIB and RIM are strongly coupled to one another in the C. elegans 
wiring diagram, with bidirectional chemical synapses as well as 
gap junctions. RIM and AVA are connected by unidirectional 
chemical synapses and by gap junctions (Figure 5A). To separate 
the contributions of chemical synapses and gap junctions, 
we used cell-specific expression of tetanus toxin light chain 
from Clostridium tetani (TeTx). TeTx reduces presynaptic vesicle 
release by cleaving synaptobrevinA/AMP, but should spare gap 
junctions and neuronal excitability (Schiavo et al., 1992). 

Expression of TeTx in AIB delayed odor responses in RIM and 
AVA and substantially decreased the length of their OFF 
response. Remarkably, AIB:TeTx also delayed odor responses 
and shortened the initial OFF duration in AIB itself, though the 
effect was not as strong as in RIM and AVA (Figures 5B-5D). 
Thus the latency and length of an AIB response to odor depends 
in part on AIB synaptic output, suggesting that AIB synapses 
oppose the antagonistic feedback that decreases AIB reliability. 

Conversely, expression of TeTx in RIM increased the reliability 
of AIB and AVA odor responses and decreased their latency, 
resembling the effects of RIM::HisCI silencing (Figures 5B-5D). 
RIM::TeTx also increased the reliability of the RIM response to 
odor. Indeed, silencing RIM chemical synapses with TeTx con- 
verted the entire network to a near-deterministic state, with 
over 80% of the neurons responding to odor addition with a rapid 
switch from the ON to OFF state that is as fast as the initial AIB 
response (Figures 5B-5G). These results indicate that RIM 
chemical synapses antagonize odor input to generate probabi- 
listic behavior in the interneuron network. 

AVA does not make chemical synapses onto AIB or RIM, but it 
has gap junctions with RIM. To probe the role of AVA feedback 
on the other neurons, we stimulated AVA with the red light-sen- 
sitive cation channel Chrimson (Klapoetke et al., 2014) during 
calcium imaging with GCaMP. Activation of AVA led to simulta- 
neous calcium increase in AVA, RIM, and AIB (Figure 5H), 
demonstrating that feedback from AVA is sufficient to drive 
AIB and RIM into high-activity states. 

RIM Decreases the Reliability of Behavioral Responses 
to Odors 

Although calcium imaging in restrained animals permitted simul- 
taneous imaging of multiple neurons, it precluded analysis of the 
behaviors triggered by odor. To characterize the relationships 
between neuronal activity and behavior, we monitored the 
activity of individual interneurons in freely moving animals, using 
microfluidic arenas suitable for fast switching of odor stimuli 
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Figure 3. AIB, RIM, and AVA Calcium Is Reduced by Odor Addition 

(A) Calcium dynamics in simultaneously recorded AIB, RIM, and AVA neurons. Traces from Figure 1D are ordered separately for each neuron according to its 
activity at the point of odor addition (gray bar above heatmaps) and secondarily according to the closest ON to OFF transition for that neuron. Buffer control data 
are in Figure S3. 

(legend continued on next page) 
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(Larsch et al., 2013) (Figure 6A). GCaMP-expressing animals 
were exposed to repeated odor pulses or buffer while recording 
both the responses of individual interneurons and the behavior of 
the animal. As shown previously, AWC activity was reliably 
modulated by odor addition and removal (Figure 6B). On 
average, the activity of AIB, RIM, and AVA also fell when odor 
was added and rose when odor was removed, as it did in 
restrained animals (Figure 6B). 

Alignment of neuronal activity with behavioral status of the an- 
imals showed that the activity of all three interneurons was also 
correlated with reversal behaviors, whether these were sponta- 
neous (Figure 60) or induced by odor (Figure S5). This correlation 
was not observed for AWC unless odor stimuli were used. The 
beginning of a reversal coincided with rising AIB, RIM, or AVA 
calcium, and the end of the reversal corresponded to falling cal- 
cium. These results suggest that the activity of AIB, RIM, and 
AVA is correlated in freely moving as well as restrained animals 
and indicate that the all-ON state corresponds to reversals. 
Nonetheless, there were differences in the calcium dynamics 
of restrained and freely moving animals; high-activity ON states 
were shorter in freely moving animals, as has previously been 
noted for AVA (Chronis et al., 2007; Ben Arous et al., 201 0; Fau- 
mont et al., 2011). 

Since RIM decreased the reliability of odor responses in AIB, 
RIM, and AVA, it should affect the corresponding behavioral 
responses to odor. To test this prediction, RIM synapses were 
inactivated with TeTx synaptic silencing, and odor pulses were 
delivered under conditions that led to probabilistic reversal re- 
sponses (Figure 6D). As predicted by the imaging experiments, 
RIM:TeTx animals had a more reliable response to odor addition 
than wild-type animals (Figure 6D). Thus, behavioral variability in 
the odor response, like neuronal variability, is increased by RIM. 

DISCUSSION 

Animals navigating a complex environment do not simply repro- 
duce external stimuli through their actions; instead, they shape 
behavior appropriate to conditions. Chemotaxis circuits can 
transform smooth or noisy sensory inputs into discrete, probabi- 
listic reorientations. More generally, animals make choices and 
explore environments through discrete, mutually exclusive 
actions, which are also the basis of experimental behavioral 
paradigms such as go-nogo orforced choice decisions (Frederick 



et al., 201 1 ). Here we consider one question raised by this class of 
behavior: how are such probabilities resolved into decisions? 

Although sensory information can be limiting for performance, 
sensory noise in the C. elegans chemotaxis circuit is not the 
source of probabilistic behavior. AWC responds to odor rapidly 
and reliably in each trial, but at a behavioral level, reorientation 
behavior is probabilistic at odor concentrations 1 ,000-fold higher 
than the detection threshold (Larsch et al., 2013). Instead, our re- 
sults indicate that interneurons integrate the sensory response 
with ongoing network activity to generate responses that vary 
in their timing and probability. 

The AIB interneurons are major synaptic targets of AWC and are 
also direct or indirect targets of many other sensory neurons (Fig- 
ure 7). Although AIB neurons respond to odors more reliably than 
other interneurons, they do not respond in every trial. However, 
AIB can respond to odors more consistently when RIM or AVA 
is silent. Variability in AIB response to odors thus results from 
interference from neurons closer to the motor response, the back- 
ward command neuron AVA and the interneuron RIM. Effectively, 
sensory input and RIM compete to regulate AIB, which then acts 
with RIM, AVA, and other neurons in a collective network state. 

Both RIM and AIB chemical synapses affect the network. RIM 
synapses are an essential component of feedback onto AIB, and 
indeed the RIM connection to AIB is very strong when both syn- 
apse size and synapse number are taken into account (www. 
wormwiring.org) (Figure 7). RIM releases the neurotransmitters 
glutamate, acetylcholine, and tyramine, any of which could poten- 
tially act on AIB. AIB’s own chemical synapses promote its odor 
response, potentially by antagonizing feedback from RIM and 
AVA. AIB uses glutamate as a transmitter, and RIM expresses 
both excitatory and inhibitory glutamate receptors (Hart et al., 
1995; Maricq et al., 1995; Piggott et al., 201 1). The detailed prop- 
erties of these synapses remain to be determined. Another open 
question is the contribution of the gap junctions linking AIB, RIM, 
and AVA. Optogenetic activation of AVA can drive the network, 
and in the context of the wiring diagram this is likely to involve 
AVA gap junctions with RIM, but these connections do not have 
well-defined genetics or pharmacology and we did not target 
them directly. It should be noted that the temporal resolution of 
calcium imaging is limited, even when using the higher resolution 
provided by examining axons instead of cell bodies; therefore, this 
approach reports the final outcome but not the interactions be- 
tween different chemical synapses and gap junctions. 



(B) Example traces highlighting parameters analyzed in (C)-(F). “Delay” is the duration a neuron remained ON after odor addition. “OFF Duration” is the duration of 
the first OFF response after odor addition. “Fraction ON” is the fraction of time during the 1-min odor pulse that the neuron was ON. 

(C-F and H) Analysis of calcium dynamics from (A) for each neuron. In all panels, responses to odors are in color, and buffer controls are in black or gray. 

(0) Complementary cumulative distribution (CCD) of the ON-OFF delay for each neuron in response to a 1-min odor (red, green, blue) or buffer (gray) pulse. 

(D) Length of the ON-OFF delay after odor addition. 

(E) Initial OFF duration after odor addition. 

(F) For neurons that were ON prior to odor exposure, the fraction of time the neuron was ON during the 1 -min odor pulse. For (D)-(F), box and whisker plots show 
median response (circle), 25^*^ and 75 ^^ percentile (boxes), and full distribution (lines). 

(G) An example trace of AIB briefly responding to odor before a full OFF response. The blue box highlights the time frame in (H). 

(H) Staggered mean time derivatives of calcium response for AIB, RIM, and AVA neurons from (A) that were ON prior to odor addition. (Buffer: AIB, n = 23; RIM, n = 

24; AVA, n = 19. Odor: AIB, n = 59; RIM, n = 46; AVA, n = 51.) Shaded regions are ±SEM. 

(1) Schematic diagram of the “AIB-only” state. 

(J) Heatmap of AIB calcium dynamics in response to odor for AIB-only state. 

(K) Complementary cumulative distribution (CCD) of the ON-OFF delay after odor addition for the AIB-only state. 

p values compared with controls were calculated with a Kolmogorov-Smirnov test (0 and K) or Wilcoxon rank sum test with Bonferroni correction (D-F and H). 

*p < 0.05, **p < 0.01 , ***p < 0.001 . 
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Figure 4. RIM Hyperpolarization Increases Reliability in AIB and AVA Neurons 

(A) Representative images and traces of spontaneous activity of AiB, RiM, and AVA neurons in animais expressing HisCil in RiM and AVA. Left: spontaneous 
activity in buffer in the absence of histamine. Middie: after exposure to 10 mM histamine for 5 min. Right: after exposure to 10 mM histamine for 20 min. 

(legend continued on next page) 
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The C. elegans wiring diagram is dominated by feed-forward 
connections (White et al., 1986; Varshney et al., 2011), but our 
results demonstrate a strong feedback component from ongoing 
network states that evolve on a slow second to minute timescale. 
This feedback is evident as early as the AIB interneurons, which 
are generally considered sensory integrators. The faster compu- 
tations at the sensory level and slower computations in collective 
network states allow multiple timescales of behavior to emerge. 
Feedback from slow network states can be envisioned as inertia 
in the system: once an ON or OFF state is generated, it overrides 
other inputs that must accumulate or wait for the network state to 
decay. We speculate that the collective states of AIB, RIM, and 
AVA represent attractor states, in which different starting points 
lead to a stable, self-reinforcing activity state that is either high or 
low (Hopfield, 1982). The factors determining the duration of the 
network states are unknown. Both ON and OFF states follow 
exponential distributions, but that leaves open a large number 
of mechanisms that could be stochastic, chaotic, or simply 
complex. 

The analysis presented here provides a first view of probabilistic 
behavior, but there are almost certainly other neurons that 
participate in these decisions (Figure 7). Many other neurons are 
synaptically connected to AIB, RIM, and AVA and could affect 
transitions between high and low network states, which in turn 
could affect additional interacting neurons. For example, AVB, the 
forward command neuron, receives strong synaptic input from 
both AIB and RIM and synapses onto AVA; it is likely an element 
of an antagonistic network. Recent whole-brain imaging studies 
of C. elegans have shown that AVA, AIB, and several dozen other 
neurons have correlated activities (Schrodel et al., 201 3; Prevedel 
et al., 2014). Our results agree with their conclusion that neurons 
in C. elegans have collective activity states, and the whole-brain 
imaging suggests that many other neurons could contribute to 
all-ON, all-OFF, and other possible network states. 

There is still much to learn about the composition and proper- 
ties of collective neuronal states, particularly in freely moving an- 
imals. Most of our results, as well as the whole-brain imaging of 
Schrodel et al. (201 3), were obtained in the presence of a cholin- 
ergic agonist that could have altered network activity. Moreover, 
we and others have observed that neuronal dynamics are altered 
when animals are physically restrained (Ben Arous et al., 2010), 
perhaps because of mechanical inputs or a loss of motor and 
proprioceptive feedback (Kawano et al., 201 1). 



The importance of network state on sensory processing has 
long been recognized in mammalian visual cortex and other sys- 
tems. For example, behavioral detection of visual stimuli by hu- 
man subjects can be predicted from ongoing network activity, in- 
dependent of the stimulus (Ress et al., 2000). Selective attention, 
task-specific information, and motor feedback can affect sen- 
sory processing in mammalian visual cortex (Silver et al., 2007; 
Li et al., 2004; Niell and Stryker, 2010). Motor feedback also af- 
fects the gain of visual processing in the fly brain (Maimon 
et al., 2010). 

The integration of sensory information with network states 
has appealing properties from a behavioral standpoint. First, 
a fully determined behavior may be appropriate to certain in- 
trinsically meaningful stimuli — highly toxic environmental condi- 
tions, food, and suitable mates— but most stimuli do not neces- 
sarily produce a predictable outcome and are not themselves 
rewarding. In that context, variability can prevent behavioral 
dead ends and increase real success. Second, different behav- 
iors can emerge by changing the underlying state dynamics, as 
well as sensory properties. For example, C. elegans has a 
much higher probability of spontaneous reversals immediately 
after removal from food than it has an hour later, presumably 
accompanied by differences in AVA and RIM activity, which 
would lead to differences in the transmission of sensory infor- 
mation from AWC to AIB (Gray et al., 2005). Sensory cues 
that are integrated with circuit state could accordingly generate 
different behaviors under different feeding states. Third, adjust- 
ing the strength of different inputs in a probabilistic network 
increases opportunities for plasticity, and indeed AIB and RIM 
participate in C. elegans learning circuits (Fla et al., 2010). 
Further investigation of the generation and control of variability 
may provide additional insight into these more complex behav- 
ioral mechanisms. 

EXPERIMENTAL PROCEDURES 

Standard culture, molecular biology, injection, and optogenetic methods were 
used; details and strain genotypes are in Extended Experimental Procedures. 
For GCaMP and HisCI expression, we used promoters for rig-3 (expressed in 
AVA and some pharyngeal neurons), tdc-1 (RIM and RIG), inx-1 (AIB), and str-2 
{A\NC^^). For ChR2 or Chrimson expression, we used the same promoters 
except for AVA, for which we used Cre/Lox recombination and the intersection 
between nmr-1 and rig-3 promoters (ChR2) and gir-1 and rig-3 promoters 
(Chrimson). 



(B) Heatmaps of calcium dynamics in AIB, RIM, and AVA neurons (columns) in different HisCI-expressing strains (rows) during exposure to a 1-min pulse of 92 laM 
lAA (gray bars) in the presence of histamine. Traces span 3 min and are ordered as in Figure 3A. Neuron diagrams are colored as in Figure 1 A, with the histamine- 
silenced neurons shown in white (wild-type [WT], n = 51 ; AIB::HisCi, n = 27; RiMr.HisCi, n = 29]AVA::HisCi, n = 30; RIM,AVA::HisCi, n = 36; AIB,AVA::HisCi, n = 32; 
AIB,RIM::HisCi, n = 30). 

(C-G) Analysis of calcium dynamics from data in (B), as in Figure 3. Color legend designates different HisCI strains. (C) Complementary cumulative distribution 
(CCD) of the ON-OFF delay for each neuron in response to a 1-min odor pulse. (D) Length of the ON-OFF delay after odor addition. (E) Initial OFF duration after 
odor addition. (F) For neurons that were ON prior to odor addition, the fraction of time the neuron was ON during the 1 -min odor pulse. For (D)-(F) and (H), box and 
whisker plots show median response (circles), 25^*^ and 75^^ percentile (boxes), and full distribution (lines). (G) Staggered mean time derivatives of calcium 
response for AIB, RIM, and AVA neurons that were ON prior to odor addition under different histamine-silenced conditions. Shaded regions are ±SEM. 

(H) Symmetric uncertainty coefficient for neuron pairs after silencing AIB, RIM, or AVA with HisCI (see Statistical Methods). A coefficient of one means both 
neurons are mutually dependent, while a coefficient of zero means they are completely independent. Gray indicates control values without hyperpolarization. 
Note that RIM and AVA are more tightly coupled to one another than to AIB. Silencing RIM significantly decreases the mutual information between AIB and AVA 
neurons. 

p values compared with WT controls were calculated with a Kolmogorov-Smirnov test (C) or Wilcoxon rank sum test with Bonferroni correction (D-H). *p < 0.05, 

**p < 0.01, ***p < 0.001. 
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Figure 5. RIM Chemical Synapses Drive Variability in Odor Responses 

(A) Schematic showing synapses affected by tetanus toxin expression. 

(B) Heatmap of caicium dynamics in AiB, RiM, and AVA neurons (coiumns) in different tetanus toxin-expressing strains (rows) during exposure to a 1 -min puise of 
92 laM iAA (gray bars). Traces span 3 min and are ordered as in Figure 3A. (WT, n = 37; AIBr.TeTx, n = 32; RIM::TeTx, n = 31). 

(C-G) Anaiysis of caicium dynamics from data in (B), as in Figure 3. Coior iegend in (A) designates TeTx-expressing strains. (C) Compiementary cumuiative 
distribution (CCD) of the ON-OFF deiay for each neuron in response to a 1-min odor puise. (D) Length of the ON-OFF deiay after odor addition. (E) Initiai OFF 

(legend continued on next page) 
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Figure 6. Activity of AIB, RIM, and AVA 
Correlates with Reversals in Freely Moving 
Animals 

(A) Schematic of microfluidic arena used to record 
calcium dynamics in freely moving animals. Ani- 
mals were loaded via the worm inlet. The flow of 
odor or buffer into the arena was controlled by 
positive flow from control inlets 1 and 2, respec- 
tively. Scale bars, 1 mm. 

(B) Mean neuronal calcium responses during a 
10-s exposure to 92 ^iM lAA (gray bar). Note 
different y axis for AWC and interneurons. Eight 
repeated 10-s exposures per animal; number of 
animals: AWC, n = 6; AIB, n = 4; RIM, n = 4; AVA, 
n = 4. Shaded regions are ±SEM. 

(C) Mean neuronal calcium responses in buffer 
aligned to transitions between forward, reversal, 
and omega behaviors. Data normalized to fluo- 
rescence at point of transition (Fq). Note common y 
axis for all neurons. (Forward-reversal events: 
AWC, n = 1 3; AIB, n = 25; RIM, n = 20; AVA, n = 23. 
Reversal-omega events: AWC, n = 14; AIB, n = 10; 
RIM, n = 24; AVA, n = 16. Reversal-forward events: 
AWC, n = 21 ; AIB, n = 33; RIM, n = 40; AVA, n = 20.) 
Shaded regions are ±SEM. 

(D) Behavioral responses of wild-type or RIM:TeTx 
animals exposed to 10 s of buffer, 92 nM lAA, or 
92 |iM lAA (gray bars). To match calcium imaging 
data, analysis included only animals showing re- 
versals or other aversive behavior at t = 0 prior to 
odor addition. Shaded regions are ±SD calculated 
by 10,000 sampled bootstrap sampling, p values 
were calculated with a Kolmogorov-Smirnov test. 
*p < 0.05, ***p < 0.001. 
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Calcium Imaging in Restrained Animals 

Animals were imaged in custom-built microfluidic chambers (Chronis et al., 
2007) in S basal buffer (Brenner, 1974) and paralyzed with 10 mM tetramisole 
hydrochloride (Sigma-Aldrich) during data acquisition to reduce movement. 
Each animal was imaged twice, separated by a 5-min interval. For the HisCII- 
silencing experiments, 10 mM histamine (Sigma-Aldrich) was included in the 
pre-incubation starvation conditions, the loading buffer, control buffer, and 
odor solutions. All animals were starved for 20 min before loading into the 
chip. Fluid streams were switched using a three-way valve (Lee Company). 
Tl FF stacks were generated at 1 0 frames per second (fps) using a 40x objective 



(Andor iXon3 camera, Metamorph Software). In Fig- 
yyj ures 4 and 5, wild-type data were collected in paral- 

— RIM' 'TeTx genetically modified strains. Data analysis is 

described in Extended Experimental Procedures. 

In a control dataset, the activity of AIB, RIM, and 
AVA was also imaged in the absence of tetrami- 
sole, although the neurons could not be imaged 
simultaneously due to animal movement. AIB, 
RIM, and AVA neurons generated spontaneous 
stochastic bistable activity and responded to 
odor probabilistically, both in the absence and 
presence of tetramisole (Figure S5). The immediate responses to odor addition 
were comparable with or without tetramisole, but the duration of the OFF 
response was shorter in the absence of tetramisole (Figure S5). The activity 
of AIB, RIM, and AVA neurons either in buffer or in odor was higher in the 
absence of tetramisole, i.e., the network was more often in the ON state. 

Simultaneous Behavioral and Calcium Imaging 

Methods followed those described previously (Larsch et al., 2013), using ani- 
mals in a custom-fabricated 3x3 mm polydimethylsiloxane (PDMS) imaging 
arena with fluid flowing through the arena by gravity flow (210-cm height 



duration after odor addition. (F) For neurons that were ON prior to odor addition, the fraction of time the neuron was ON during the 1 -min odor pulse. For (D)-(F), 
box plots show median response (circles), 25^*^ and 75^^ percentile (boxes), and full distribution (lines). (G) Staggered mean time derivatives of calcium response 
for AIB, RIM, and AVA neurons that were ON prior to odor addition in different tetanus toxin strains. 

(H) Oalcium dynamics of AIB, RIM, and AVA following optogenetic stimulation of AVA by Ohrimson for 1 s, 10 s, and 30 s (n = 28). Light pulses (7 = 615 nm) are 
represented by pink bar in all three panels. 

Shaded regions are ±SEM. p values compared with WT controls were calculated with a Kolmogorov-Smirnov test (0) or Wilcoxon rank sum test with Bonferroni 
correction (D-G). *p < 0.05, **p < 0.01 , ***p < 0.001 . 
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Figure 7. Weighted Wiring Diagram for AWC, AIB, RIM, and AVA 

(A) Synaptic map of AWC, AIB, RIM, and AVA and other strongly inter- 
connected neurons; arrows were weighted on the basis of the total number of 
electron micrograph sections in which any synapse was observed (in- 
corporates synapse size and number). Data are from www.wormwiring.org. 



differential, 1 5 mm/s). Each recording consisted of eight odor pulse sequences 
separated by 2 min of buffer exposure. Each pulse sequence lasted for 70 s, 
with alternate 10 s odor, 10 s buffer repeats. The first two of the eight se- 
quences consisted of buffer-to-buffer pulses, followed by two sequences of 
92 nM lAA pulses, followed by two sequences of 92 laM lAA pulses, followed 
by two additional buffer-to-buffer pulses. Animals were starved in the arena 
for 20 min prior to data acquisition. 

Metamorph software controlled the camera, light source (Lumencor SOLA- 
LE solid-state lamp), stimulus delivery (Automate Valvebank 8 II actuator and 
Lee solenoid valves), and stimulus switching (Hamilton MVP eight-way distri- 
bution valve). Stimulus switching occurred during the 2-min buffer exposure 
when the odor stream was bypassing the arena. TIFF images were collected 
at 2.5 X magnification (Hamamatsu Orca Flash 4 cMOS, Metamorph software) 
at 1 0 fps with 1 0-ms pulsed illumination for each 1 00-ms frame. Neuronal fluo- 
rescence just prior to odor removal (Fq) was used to calculate the AF/Fq for the 
odor-aligned fluorescence data in Figure 6B. Behaviors were binned into for- 
ward, reversal, pause and omega states, and manually corrected to ensure 
accuracy of the timing of transitions between states. For Figure 6C, only fluo- 
rescent data from behaviors that spanned the full observational window (-1 to 
5 s) were used to calculate the average. 

Statistical Methods 

Since the distribution of most data were not normally distributed based on the 
Shapiro-Wilk test, the significance of median differences was calculated using 
the Wilcoxon rank sum test with Bonferroni correction. The Kolmogorov-Smir- 
nov test was used for comparing probability distributions. The systematic un- 
certainty coefficient for neuron pairs (Figure 4H) was calculated as {U(X,Y) = 2 
[{H(X) + H(Y)- H(X,Y))/(H(X) + H(Y))]) (Press et al., 2002). X and / represent the 
binary data for neurons X and Y, H(X) and H(Y) are the marginal entropies for X 
and Y, and H(X,Y) is the joint entropy for X and Y. The numerator is the mutual 
information for X and Y, and the denominator is the total entropy. A coefficient 



of one means both neurons are mutually dependent, while a coefficient of zero 
means they are completely independent. 
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SUMMARY 

Somatic LINE-1 (LI) retrotransposition during neuro- 
genesis is a potential source of genotypic variation 
among neurons. As a neurogenic niche, the hippo- 
campus supports pronounced LI activity. However, 
the basal parameters and biological impact of L1- 
driven mosaicism remain unclear. Here, we performed 
single-cell retrotransposon capture sequencing (RC- 
seq) on individual human hippocampal neurons and 
glia, as well as cortical neurons. An estimated 1 3.7 so- 
matic LI insertions occurred per hippocampal neuron 
and carried the sequence hallmarks of target-primed 
reverse transcription. Notably, hippocampal neuron 
LI insertions were specifically enriched in transcribed 
neuronal stem cell enhancers and hippocampus 
genes, increasing their probability of functional rele- 
vance. In addition, bias against intronic LI insertions 
sense oriented relative to their host gene was 
observed, perhaps indicating moderate selection 
against this configuration in vivo. These experiments 
demonstrate pervasive LI mosaicism at genomic 
loci expressed in hippocampal neurons. 

INTRODUCTION 

The extent to which the genome of one cell differs from that 
of any other cell from the same body is unclear. DNA replication 
errors, mitotic recombination, aneuploidy, and transposable 
element activity can cause somatic mosaicism during ontogen- 
esis and senescence. In humans, the consequences of somatic 
mosaicism are most apparent in disease, including cancer and 
developmental syndromes (Youssoufian and Pyeritz, 2002). 
The impact of mosaicism among normal cells is relatively unde- 
fined beyond the notable exception of V(D)J recombination 
and somatic hypermutation intrinsic to lymphocyte antigen 
recognition (Hozumi and Tonegawa, 1976). Reports of retro- 
transposition (Baillie et al., 2011; Coufal et al., 2009; Evrony 
et al., 2012; Li et al., 2013; Muotri et al., 2005; Perrat et al., 
2013) and other genomic abnormalities (Cai et al., 2014; Gole 



et al., 201 3; McConnell et al., 201 3) in animal neurons may there- 
fore be important given that, as for immune cells, mosaicism is a 
plausible route to neuron functional diversification. 

Of approximately 500,000 LINE-1 (LI) copies present in the 
human genome, only ~100 members of the Ll-Ta and pre-Ta 
subfamilies remain transposition-competent (Beck et al., 2010; 
Brouha et al., 2003). LI mobilization primarily occurs via target 
primed reverse transcription (TPRT), a process catalyzed in cis 
by two proteins, ORF1 p and ORF2p, translated from the bicis- 
tronic 6 kb LI mRNA. LI ORF2p encodes endonuclease (EN) 
and reverse transcriptase (RT) activities essential to LI retro- 
transposition and also responsible for trans mobilization of Alu 
and SVA retrotransposons (Dewannieux et al., 2003; Hancks 
et al., 201 1 ; Raiz et al., 201 2). A typical TPRT-mediated LI inser- 
tion involves a degenerate LI EN recognition motif (5'-TT/AAAA), 
an LI poly-A tail and, crucially, produces target site duplications 
(TSDs) (Jurka, 1997; Luan et al., 1993). Various host defense 
mechanisms suppress LI activity (Beck et al., 2011), including 
via methylation of the CpG-rich LI promoter. Neural progenitors 
and other multipotent cells can nonetheless permit LI promoter 
activation (Coufal et al., 2009; Garcia-Perez et al., 2007; Wissing 
et al., 2012), a pattern accentuated in the hippocampus, likely 
due to its incorporation of the neurogenic subgranular zone (Bail- 
lie et al., 201 1 ; Coufal et al., 2009). This coincidence of neurogen- 
esis, LI activity, and mosaicism has elicited speculation that LI 
mobilization could impact cognitive function rooted in the hippo- 
campus (Richardson et al., 2014). 

Despite extensive evidence of somatic retrotransposition 
in the brain, many fundamental aspects of the phenomenon 
remain unclear. The rate of LI mobilization in the neuronal line- 
age is, for instance, a major unresolved issue. Estimates range 
from <0.1 to 80 somatic LI insertions per neuron (Coufal et al., 
2009; Evrony et al., 2012). Experiments using engineered LI re- 
porter systems have shown that LI mobilization is likely to occur 
via TPRT in neuronal precursor cells and may be altered by 
neurological disease (Coufal et al., 2011; Coufal et al., 2009; 
Muotri et al., 2005; Muotri et al., 2010). However, it is unknown 
whether endogenous LI retrotransposition in hippocampal neu- 
rons adheres to these predictions. Most importantly, it is unclear 
whether somatic LI insertions influence neuronal phenotype or 
endow carrier neuronal progenitor cells with a selective advan- 
tage or disadvantage in vivo. To address these questions, we 
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Figure 1. Single-Cell RC-Seq Workflow 

(A) NeuN"^ hippocampal nuclei were first purified by FACS (see also Figure S1). 

(B) Nuclei were then picked using a self-contained microscope and micromanipulator. 

(C) DNA was extracted from nuclei and subjected to linear WGA, followed by exponential PGR in two separate reactions for each nucleus, using different 
enzymes. 

(D) Exponential WGA products for each nucleus were combined, used to prepare lllumina libraries, and analyzed via WGS to assess genome coverage and 
possible amplification biases. 

(E) Libraries prepared in (D) were enriched via hybridization to L1-Ta LNA probes. 

(F) Enriched libraries were sequenced with 2 x 150-mer lllumina reads and analyzed to identify novel L1 integration sites (see also Figure S2). 



applied single-cell retrotransposon capture sequencing (RC- 
seq) to hippocampal neurons and glia, as well as cortical neu- 
rons, and found that L1 retrotransposition is a major endogenous 
driver of somatic mosaicism in the brain. 

RESULTS 

Pervasive LI Mobilization in Hippocampal Neurons 

Several biological and technical factors hinder accurate calcula- 
tion of somatic L1 mobilization frequency using bulk DNA ex- 
tracted from tissue, as well as subsequent PCR validation and 
structural characterization of individual somatic L1 insertions 
(Richardson et al., 2014). We therefore developed a single-cell 
RC-seq protocol to detect somatic LI insertions in individual 
neurons. Briefly, NeuN^ hippocampal nuclei were purified by 
fluorescence activated cell sorting (FACS) (Figures 1A and SI), 
with single nuclei isolated using a self-contained microscope 
and micromanipulator (Figure IB). Whole-genome amplification 
(WGA) was achieved through an extensively optimized version of 
the quasi-linear Multiple Annealing and Looping Based Am- 
plification Cycles (MALBAC) protocol (Zong et al., 2012) and 
was followed by lllumina library preparation (Figures 1C and 



ID). Libraries were then subjected to low-coverage (0.35 x) 
whole-genome sequencing (WGS) as a quality control step to 
assess amplification bias and, in parallel, hybridized and pro- 
cessed by RC-seq (Figures IE and IF). 

RC-seq utilizes sequence capture to enrich DNA for the junc- 
tions between retrotransposon termini and adjacent genomic 
regions, followed by paired-end sequencing, alignment, and 
clustering, to reveal LI insertions absent from the reference 
genome. Flere, we replaced previous RC-seq sequence capture 
pools (Baillie et al., 201 1 ; Shukla et al., 201 3) with two locked nu- 
cleic acid (LNA) probes respectively targeting the extreme 5' and 
3' ends of LI -Ta. These probes capture typical LI insertions at a 
3' LI -genome junction, and full-length or heavily 5' truncated LI 
insertions at a 5' LI -genome junction (Figure S2), and delivered a 
15-fold improvement in LI enrichment compared with previous 
RC-seq applied to brain (Baillie et al., 2011). Assembly of each 
overlapping read pair into a “contig” enabled computational 
identification of molecular chimeras and removal of PCR dupli- 
cates, and provided single-nucleotide resolution of LI integra- 
tion sites by fully spanning LI -genome junctions (Figure S2). 

Prior to single-cell RC-seq, we performed deep coverage 
(~80x) RC-seq on bulk DNA extracted from the post-mortem 
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hippocampus and matched liver samples of four individuals 
(identifiers CTRL-36, CTRL-42, CTRL-45, and CTRL-55) without 
evidence of neurological disease (Table S1). Bulk RC-seq on 
average detected 97.5% of 960 annotated reference genome 
L1-Ta copies (Evrony et al., 2012), indicating high assay sensi- 
tivity. As expected, we detected ~210 polymorphic Ll-Ta inser- 
tions absent from the reference genome, per individual (Tables 
SI and S2). This defined the polymorphic (germline) Ll-Ta 
insertion cohort for each individual and provided a positive con- 
trol for subsequent single-cell RC-seq analyses. 

Next, 92 individual neuronal nuclei were isolated from the 
aforementioned hippocampi, subjected to WGA and analyzed 
by WGS. Globally, WGS revealed that 4,226/4,232 (99.9%) chro- 
mosomes amplified (Figure 2A) with recurring WGA bias largely 



Figure 2. Single-Cell WGS and RC-Seq Ana- 
lyses of 92 Hippocampal Neurons 

(A) Chromosome copy number in each amplified 
genome, assessed by WGS. Box-and-whisker 
plots indicate median chromosomal copy number 
and quartiles across all neurons. Empty circles 
represent chromosomes with copy number >1.5 
IQR from the median. Sex chromosomes for 
CTRL-36 (female, 9) and CTRL-42, CTRL-45, and 
CTRL-55 (male, 6 ) are presented separately. Six 
autosomes, marked in red, had copy number < 1 . 
Two sex chromosomes with log 2 copy number < 
-2 are colored purple. 

(B) WGS indicated 16.2 Mb and 9.4 Mb regions of 
localized AD (indicated by red bars) on chromo- 
some 6 of neuron CTRL-45-HN-#2. Each blue 
diamond corresponds to a 600 kb “bin”. One bin 
with log 2 copy number < -5 is colored purple. 

(C) Percentages of LD (dark gray) and AD (light 
gray) bins in each neuron, assessed by WGS. 

(D) Percentage of reference genome LI -Ta copies 
detected by single-cell RC-seq in each neuron. 

(E) Percentage of polymorphic Ll-Ta insertions 
found in the corresponding bulk RC-seq libraries 
for each individual and also detected by single-cell 
RC-seq. 

(F) Somatic LI insertion counts observed in each 
neuron by single-cell RC-seq. 

Note: in (C-F) yellow, brown, blue, and green 
histogram columns correspond to individuals 
CTRL-36, CTRL-42, CTRL-45, and CTRL-55, 
respectively. See also Figures S3 and S4 and Ta- 
bles SI and S2. 



limited to telomeres (Figures S3, S4A 
and S4B). Higher-resolution copy-num- 
ber variation (CNV) analysis based on 
the division of the genome into adjust- 
able-width “bins” with an average size 
of ~600 kb revealed five non-telomeric 
deletions larger than ~5 Mb. The largest 
and third largest of these occurred on 
chromosome 6 of CTRL-45 hippocampal 
neuron 2 (CTRL-45-HN-#2) and were 
1 6.2 Mb and 9.4 Mb in length (Figure 2B). 

An alternative CNV analysis using ~60 kb 
bins indicated the presence of numerous subregions in the 16.2 
Mb example where chromosomal copy number was > 2 (Fig- 
ure S4C), depicting a region of highly variable WGA performance 
and, arguably, contraindicative of a genuine deletion in vivo. 
Genome-wide, allelic dropout (AD) and locus dropout (LD) 
respectively affected 8.0% and 0.7% of bins at 600 kb resolution 
(Figure 2C, Table SI), indicating efficient amplification across 
>90% of the genome. Importantly, we optimized WGA parame- 
ters to not deplete Ll-Ta copies from amplified DNA, with the 
mean ratio of WGS reads aligned to reference Ll-Ta 5' or 3' 
LI -genome junctions at 0.81 and 1.28 of expected values, 
respectively (Figures S4D and S4E; Table SI). These results 
show robust WGA for individual neurons, without significant 
loss of reference genome Ll-Ta copies. 



CTRL-55 



CTRL-55 



230 Cell 161 , 228-239, April 9, 2015 ©2015 The Authors 





Cell 



Single-cell RC-seq applied to each of the 92 libraries analyzed 
by WGS detected 61.3% of reference genome Ll-Ta copies 
(Figure 2D, Table SI) and 49.0% of polymorphic Ll-Ta insertions 
in each neuron (Figure 2E), as defined by the earlier bulk RC-seq 
experiments. The latter figure provided a provisional estimate of 
assay sensitivity for somatic LI insertions. A total of 2,782 puta- 
tive somatic Ll-Ta and pre-Ta insertions (Figure 2F, Table S2) 
were identified in at least one hippocampal neuron, were not 
detected in any bulk liver RC-seq library or more than one hippo- 
campus by single-cell or bulk RC-seq, and were absent from ex- 
isting LI polymorphism databases (Ewing and Kazazian, 2010, 
2011; Iskow et al., 2010; Shukla et al., 2013; Wang et al., 
2006). Of these insertions, 1,024 (36.8%) and 34 (1.2%) were 
found in introns and exons, respectively. Twelve (0.4%) somatic 
LI insertions were detected at both their 5' and 3' LI -genome 
junctions, 760 (27.3%) at only a 5' junction, and 2,010 (72.3%) 
at only a 3' junction. Notably, nine somatic LI insertions detected 
by single-cell RC-seq were also detected and annotated as so- 
matic in the corresponding hippocampus bulk RC-seq library, 
and 13 were detected by single-cell RC-seq in more than one 
neuron from the same hippocampus. Of somatic LI insertions, 
98.2% belonged to the Ll-Ta subfamily, and 1.8% were anno- 
tated as pre-Ta. Although at 5' LI -genome junctions RC-seq 
captures only full-length and very heavily truncated Lis (Fig- 
ure S2), we found 123 full-length LI insertions, representing 
4.4% of all events and including two instances of 5' transduction. 
Of those insertions detected at their 3' LI -genome junction, 
151 (7.5%) carried a putative transduced 3' flanking sequence 
(Moran et al., 1999). This LI 3' transduction rate was lower 
than reported for germline LI retrotransposition (Goodier et al., 
2000), likely due to assay design not encompassing 3' transduc- 
tions longer than ~1 00 bp, as reported elsewhere (Goodier et al., 
2000; Macfarlane et al., 2013). 

PCR Validation and Structural Characterization of 
Somatic LI Insertions 

To determine the true positive rate of single-cell RC-seq, we 
randomly selected 20 somatic LI insertions detected at only a 
3' LI -genome junction and PCR amplified the opposing 5' 
LI -genome junction. This enabled detection of TPRT sequence 
hallmarks that distinguish WGA artifacts from most genuine LI 
integration sites; specifically a TSD, an LI EN target motif and 
an LI poly-A tail (Jurka, 1997; Luan et al., 1993). Through PCR 
and sequencing, 5' LI -genome junctions were identified for 
nine insertions and, when combined with the corresponding 3' 
LI -genome junctions described by RC-seq, indicated TSDs 
and polyA-tails in all cases, and plausible LI EN motifs for 7/9 
(77.8%) examples (Tables S2 and Data SI). PCR validated inser- 
tions included full-length (Figure 3A) and variably 5' truncated 
(Figures 3B-F) Lis. Intronic LI insertions were found sense ori- 
ented to two genes expressed in brain, ZFAND3 (Figure 3B) 
and USP33 (Table S2). One LI insertion incorporated a 3' trans- 
duction and was detected by PCR in two neurons of CTRL-42 
(Figure 3D). Further, PCR applied to the full panels of analyzed 
neurons from each individual revealed that two other LI inser- 
tions were present in 10/21 and 2/21 neurons, respectively (Fig- 
ures 3E and 3F). Three of the validated LI insertions generated 
TSDs >40 bp in length. 



These experiments showed that nearly half of somatic LI in- 
sertions detected by single-cell RC-seq at a 3' LI -genome 
junction could be confirmed as genuine TPRT-mediated re- 
trotransposition events. By contrast, PCR validation for 10 
randomly selected exonic LI insertions detected at a 5' L1- 
genome junction by single-cell RC-seq failed to find the 
opposing 3' LI -genome junction in all cases (Table S2). This 
was consistent with the LI polyA-tail obstructing PCR amplifi- 
cation of somatic LI insertion 3' ends (Baillie et al., 2011) and 
arguably did not resolve whether LI insertions detected only 
at a 5' LI -genome junction were false positives. Finally, we 
selected 4 LI insertions found at both their 5' and 3' LI -genome 
junctions by single-cell RC-seq; all four were confirmed by PCR 
and presented TPRT hallmarks, including one with a 92 bp TSD 
(Table S2). 

Nearly 75% of somatic LI insertions found by single-cell RC- 
seq were detected only at a 3' LI -genome junction (Figure S2). 
Given this preponderance, we sought to ascertain why the 
matching 5' LI -genome junction could not be identified by 
PCR for 11/20 selected examples of this type. PCR amplification 
failure was potentially due to RC-seq false positives, structurally 
exotic LI insertions (Gilbert et al., 2005) or, alternatively, WGA 
inconsistently amplifying the 5' LI -genome junctions of inser- 
tions detected at a 3' LI -genome junction by single-cell RC- 
seq. To model the latter possibility, we randomly selected 
12 polymorphic LI insertions detected by bulk RC-seq and 
confirmed as heterozygous by genotype PCR. We performed 
PCR using bulk DNA to confirm each insertion was detectable 
at its 5' LI -genome junction and then selected 100 random ex- 
amples in individual neurons where these polymorphic Lis 
were detected at only a 3' LI -genome junction by single-cell 
RC-seq (Table S2). We attempted PCR amplification of the cor- 
responding 5' LI -genome junction for each neuron, hence reca- 
pitulating the validation process for somatic LI insertions, and 
confirmed 50/100 examples. This assay indicated the maximum 
PCR validation rate (50.0%) for somatic LI insertions detected at 
only a 3' LI -genome junction by single-cell RC-seq and, given 
the validation rate reported above (9/20, 45%), implied a true 
positive rate potentially as high as 9/10 (90.0%). 

LI Mobilization Frequency in Diverse Neural Cell 
Populations 

Single-cell RC-seq identified mean somatic LI insertion counts 
of 48.4, 27.5, 30.5, and 14.8 per hippocampal neuron in CTRL- 
36, CTRL-42, CTRL-45, and CTRL-55, respectively, yielding an 
overall mean count of 30.4 (Figure 2F). To estimate the overall 
true positive mean, we incorporated the PCR validation rate 
(45.0%) calculated above, leading to a conservative rate calcu- 
lation of 13.7 somatic LI insertions per hippocampal neuron. If, 
more conservatively, only LI insertions detected at a 3' L1- 
genome junction were considered, the true positive mean was 
9.9. Conversely, if all LI insertions were considered, we gener- 
ously incorporated the maximum PCR validation rate calculated 
above (90%) and we corrected for assay sensitivity in terms of 
polymorphic LI insertions detected (49.0%), the estimated true 
positive mean was greatly increased to 55.8. Thus, given a true 
positive mean of 13.7 somatic LI insertions per neuron, and 
the detection of at least one event in every neuron (Figure 2F), 
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Figure 3. PCR Validation of Somatic L1 In- 
sertions 

(A-F) Validated examples from hippocampal 
neuron single-cell RC-seq data included: (A) a full- 
length L1 insertion in neuron CTRL-42-HN-#1 3; (B) 
a truncated L1 insertion in neuron CTRL-42-HN- 
#1 1 ; (C) a heavily truncated L1 insertion in neuron 
CTRL-55-HN-#1 5; and (D) a very heavily truncated 
L1 insertion yielding a 3' transduction in neuron 
CTRL-42-HN-#4, also validated in neuron CTRL- 
42-HN-#3, and traced to a donor L1-Ta on chro- 
mosome 3; (E) a very heavily truncated L1 insertion 
detected in CTRL-42-HN-#13 and validated in 
10/21 CTRL-42 hippocampal neurons tested. 
Asterisks denote neurons where validation suc- 
ceeded; (F) a very heavily truncated LI insertion 
detected in CTRL-42-HN-#4 and also validated in 
CTRL-42-HN-#22. Note: in (A-F) the 3' LI -genome 
junction was detected by single-cell RC-seq, while 
the 5' LI -genome junction was identified by 
insertion-site PCR (using primers indicated by a 
and p) and sequencing. Green triangles indicate 
TSDs. Numbers below the 5' LI -genome junction 
indicate the equivalent Ll-Ta consensus position. 
See also Table S2 and Data SI . 
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we concluded that L1 mosaicism was ubiquitous among the hip- 
pocampal neurons studied. 

Prior in vitro experiments based on an engineered L1 reporter 
indicated that glia may support far less L1 mobilization than neu- 
rons (Coufal et al., 2009). To evaluate glial lineage endogenous 
L1 retrotransposition in vivo, we performed single-cell RC-seq 
upon 22 glial nuclei (NeuN“/Ki67“) isolated from CTRL-42, 
CTRL-45, and CTRL-55 hippocampi, and detected 316 putative 
somatic LI insertions (Figures 4A and S5). This produced a mean 
true positive estimate of 6.5 insertions per glial cell, based on the 
PCR validation rate determined for hippocampal neurons 
(45.0%). This rate was 52.6% lower than the estimated 13.7 in- 
sertions for hippocampal neurons, a significant difference (p < 
0.005, two-tailed t test, df = 112). Interestingly, four insertions 
were found in both glial and neuronal cells by single-cell RC- 



seq, with one of these instances detected 
at both its 5' and 3' LI -genome junctions, 
revealing a 12 bp TSD (Table S2). We 
concluded that LI insertions can arise in 
proliferating neural stem cells prior to glial 
or neuronal commitment, while glia other- 
wise support less LI mobilization than 
neurons. 

A recent single-cell genomic analysis of 
300 cortex and caudate nucleus pyrami- 
dal neurons elucidated <0.1 somatic LI 
insertions per cell, and concluded that 
LI was not a major driver of neuronal di- 
versity (Evrony et al., 2012). However, 
the biological or technical reasons for 
such disparate results compared with 
prior data from the hippocampus were 
unclear. We therefore performed single- 
cell RC-seq upon 35 NeuN'^ nuclei isolated from CTRL-42, 
CTRL-45 and CTRL-55 cortex tissue, including seven pyramidal 
neurons, and identified 1 ,262 putative somatic LI insertions (Fig- 
ures 4B and S5). This provided a true positive mean estimate of 
16.3 insertions per cortical neuron, a figure higher than hippo- 
campal neurons, but not significantly different. An estimated 
10.7 insertions occurred per cortex pyramidal neuron, a rate 
substantially lower than the remaining cortical neurons but a dif- 
ference that fell short of statistical significance (p < 0.16, two- 
tailed t test, df = 33). These data elucidate LI mosaicism in 
cortical neurons and exclude a biological explanation for incon- 
sistency with the previous study. 

PCR validation including TSD discovery underpins accurate 
calculation of LI mobilization frequency and reflects experi- 
mental veracity independent of methodology (Richardson et al.. 
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Figure 4 . L 1 Mobilization in Diverse Neural Cell Types 

(A) Somatic LI insertion counts observed by singie-ceii RC-seq appiied to hippocampai giia. 

(B) As for (A) except for corticai neurons. Seven pyramidai neurons are indicated by an asterisk. 

(C) As for (A) except for AGS-1 hippocampai neurons. 

(D) LI qPCR indicated iower LI copy number in AGS-1 hippocampus versus controis (p < 0.002, two-taiied t test, df = 23). Data represent the mean of 5 technicai 
repiicates ± SD. 

(E) Mean somatic LI insertion counts detected by singie-ceii RC-seq in each hippocampus strongiy correiated (R^ = 0.93) with LI copy number quantified by 
qPCR (D). 

See aiso Figure S5 and Tabie S2. 



2014). It is therefore notable that, at this stringency, Evrony et al. 
reported a PCR validation rate of 1/96 and a consequential 
paucity of L1 activity. Two key technical considerations may 
explain our discrepant findings. First, RC-seq reads fully span 
L1 -genome junctions (Figure S2), enabling bioinformatic identifi- 
cation of molecular chimeras before PCR validation. The earlier 
work by contrast followed a design (Ewing and Kazazian, 2010) 
that typically did not resolve L1 -genome junctions, prohibiting 
computational removal of chimeric reads. Instead, the authors 
maintained that artifacts, including those generated by WGA 
and lllumina library preparation, should present lower read depth 
than genuine L1 insertions, and essentially adhered to the same 
principle in a very recent study applying WGS to a smaller number 
of neurons (Evrony et al., 2015). This assumption is crucial as, at 
least in single-cell RC-seq libraries, putative chimeras are dispro- 
portionately likely to amplify efficiently and accrue high read 
depth (Figures 5A and 5B). Second, Evrony et al. selected candi- 
dates for PCR validation effectively as a function of high read 
count and not at random (Figure 5C). This approach would 
strongly enrich for artifacts if applied to single-cell RC-seq data 
(Figure 5B). It follows that, without the capacity to filter artifacts 
a priori, the previous study resolved numerous molecular 
chimeras after PCR and capillary sequencing of putative L1 



insertions, substantially reducing the reported validation rate. 
By contrast, we selected PCR validation candidates at random 
(Figure 5D). These factors plausibly explain why our validation 
rate of 9/20 (45.0%) was significantly higher than the rate of 
1/96 (1.0%) reported by the earlier work (p < 1 x 10“^°, chi- 
square test, df = 1 ), as well as the disparate estimates of somatic 
L1 retrotransposition made by each study. 

Recent qPCR based estimates of L1 CNV in human tissue, as 
well as in vitro L1 reporter assays, indicate L1 mobilization may 
be pronounced in a range of neurodevelopmental and psychiat- 
ric diseases (Richardson et al., 201 4) including Aicardi-Goutieres 
syndrome (AGS). AGS is a rare, severe neurodevelopmental 
condition, characterized by mutations in several genes thought 
to inhibit reverse transcription, including SAMHD1 (Zhao et al., 
2013). To address whether SAMHD1 deficiency in AGS patients 
increases neuronal L1 mobilization, we first applied bulk RC-seq 
to the post-mortem hippocampus and fibroblasts of an AGS pa- 
tient (identifier AGS-1) carrying two loss-of-function SAMHD1 
mutations. We then performed single-cell RC-seq upon 21 
neuronal nuclei from AGS-1 hippocampus and identified 373 pu- 
tative somatic L1 insertions (Figures 4C and S5), leading to a true 
positive mean estimate of 8.0 insertions per AGS-1 neuron. This 
figure was significantly (p < 0.03, two-tailed t test, df = 11 2) lower 
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Figure 5. Single-Cell RC-Seq Efficiently Excludes Molecular 
Artifacts 

(A) Distribution of read “peaks” indicating possibie somatic L1 insertions de- 
tected by singie-neuron L1 insertion profiiing (L1-iP) (Evrony et ai., 2012). 

(B) As for (A), except for aii single-ceii RC-seq data presented here. Peaks were 
annotated as chimeric or as iikeiy genuine LI insertions by sequence anaiysis 
of RC-seq reads. 

(C) Distribution of read peak height for LI insertions seiected for vaiidation by 
Evrony et ai. The LI insertion successfuiiy vaiidated by TSD discovery is 
coiored biack. The remaining insertions not vaiidated to this standard are 
coiored red. 

(D) As for (C), except for LI insertions detected by singie-ceii RC-seq and 
seiected at random for vaiidation. 




LNA/^ 

probe 



B c 





TSD size 



Figure 6. Hallmarks of TPRT Revealed by Bulk RC-Seq 

(A) A 6 kb Ll-Ta eiement incorporates 5' and 3' UTRs and two ORFs. ORF2p 
presents EN and RT domains. Methyiation of a CpG isiand present in the 5' 
UTR reguiates LI promoter activity. The iocations of two capture probes used 
by RC-seq are indicated beiow the LI . Note: TSDs and probes are not drawn 
to scaie. See aiso Figure S2. 

(B) TPRT haiimark features, inciuding TSDs and an LI EN recognition motif, 
can be identified by RC-seq, inciuding for insertions detected at oniy a 5' or 3' 
LI -genome junction. 

(C) Consensus LI EN motifs for poiymorphic and somatic LI insertions de- 
tected at their 5' and 3' LI -genome junctions, and somatic LI insertions found 
at oniy a 3' LI -genome junction. 

(D) Observed TSD size distributions for poiymorphic and somatic LI insertions, 
normaiized to random expectation. See aiso Figure S6. 



than the 13.7 somatic LI insertions found for control hippocam- 
pal neurons. A more significant difference was observed when 
AGS-1 neurons were compared only with the age (18 years) 
and gender (female) matched hippocampal neurons of CTRL- 
36 (p < 0.0001, two-tailed t test, df = 44). As corollary, LI 
qPCR also indicated significantly lower (p < 0.002, two-tailed t 
test, df = 23) LI copy number in AGS-1 hippocampus versus 
controls (Figure 4D). Finally, the results of the LI CNV assay 
were strongly correlated (R^ = 0.93) with the mean somatic LI 
insertion frequencies estimated by single-cell RC-seq (Fig- 
ure 4E). We therefore concluded that LI mobilization was un- 
likely to be elevated in AGS-1 hippocampus. 



Somatic LI Retrotransposition Occurs via TPRT 

As the 13 total somatic LI insertions detected by single-cell RC- 
seq and validated by PCR generally followed the TPRT model, 
we next assessed whether somatic LI insertions detected by 
bulk RC-seq also carried TPRT signatures. RC-seq separately 
applied to DNA extracted from the four control hippocampus 
samples elucidated 318,866 putative somatic LI insertions 
(Table SI). Again exploiting LI -genome junction resolution by 
RC-seq reads (Figures 6A and 6B and S2), we found a strong 
enrichment for the LI EN motif (Figure 6C), a typical TSD size 
range of 5-35 nt (Figures 6D and S6) and a median LI poly-A 
tail length of 33 nt for somatic LI integration sites identified by 
bulk RC-seq. We also identified a substantial group of insertions 
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with TSDs > 40 bp in length (Figure S6). Thus, single-cell RC-seq 
and RC-seq applied to bulk DNA both elucidated the hallmark 
sequence features of TPRT-mediated retrotransposition. 

Somatic LI Insertions Are Enriched in Neurobiology 
Genes 

Substrate DNA chromatinization modulates L1 EN target site 
nicking in vitro (Cost et al., 2001). As such, dynamic changes 
to chromatin state during neurogenesis may impact the associ- 
ated genome-wide pattern of LI mobilization. An intersection 
of somatic LI insertion sites detected by hippocampus bulk 
RC-seq with RefSeq gene coordinates revealed significant (p < 
1.0 X 10“^^°, Fisher’s exact test, Bonferroni correction) deple- 
tion for insertions in exons and promoters versus random sam- 
pling and significant (p < 3.8 x 10“^°) enrichment for introns 
versus polymorphic insertions (Table S3). Exons and introns car- 
rying gene ontology (GO) terms relevant to neurobiology were 
however enriched for somatic LI insertions (Tables S4 and S5) 
compared with random sampling performed by gene identifier 
or by genomic coordinate (p < 4.5 x 10“^ and p < 0.03, respec- 
tively, Fisher’s exact test, Benjamini-Hochberg correction). The 
latter result indicated enrichment for LI insertions in genes ex- 
pressed in the brain, despite taking into account that their length 
is on average >50% greater than that of other genes. By consid- 
erable margin, the most enriched GO term found (Table S5) was 
“regulation of synapse maturation” (p < 1.7 x 10“®°, Fisher’s 
exact test, Benjamini-Hochberg correction). Genome-wide pat- 
terns for somatic LI insertions detected in glia and neurons by 
single-cell RC-seq typically corroborated those found by bulk 
RC-seq, including enrichment in introns and depletion from pro- 
moters and exons (Table S3) and even stronger enrichment in 
neurobiology genes annotated by GO term (Tables S4 and S5). 
Intriguingly, in AGS-1 hippocampal neurons we did not observe 
enrichment for LI insertions in neurobiology genes (Table S4), 
whereas enrichment was observed for control hippocampal neu- 
rons, even if each individual was analyzed separately. As a con- 
trol experiment, from the liver bulk RC-seq data we identified a 
set of 175 potential liver-specific LI insertions (see Extended 
Experimental Procedures) that collectively presented a clear LI 
EN consensus motif (Figure S6D) and, owing to the sensitivity 
of bulk RC-seq, were unlikely to represent incorrectly annotated 
polymorphic LI insertions (Table SI). Notably, these liver-spe- 
cific LI insertions exhibited no enrichment for neurobiology 
genes (Table S4). We concluded that somatic LI retrotransposi- 
tion in neural cells preferentially occurs into the euchromatic re- 
gions of the genome contributing to neurobiology. 

Hippocampal LI Insertions Prefer Genomic Loci 
Transcribed in the Hippocampus 

Open chromatin is a typical prerequisite for efficient transcription 
(Neph et al., 2012). With this in mind, we used single-molecule 
cap analysis of gene expression (CAGE) transcriptome profiling 
data from the FANTOM5 consortium (Forrest et al., 2014) to 
test whether genes strongly transcribed in the hippocampus 
were specifically enriched for somatic LI insertions in hippocam- 
pal neurons. We first identified genes differentially upregulated in 
hippocampus, cortex, caudate nucleus, liver, or heart tissue sur- 
veyed by CAGE and then intersected these gene lists with the 



cohort of intragenic somatic LI insertions detected by single- 
cell RC-seq applied to hippocampal neurons. Only those genes 
upregulated in hippocampus versus heart, and hippocampus 
versus liver, were significantly enriched (p < 0.05, Fisher’s exact 
test, Benjamini-Hochberg correction) for insertions (Figure 7A, 
Table S6). Somatic LI insertions in hippocampal glia were 
also most enriched in genes upregulated in the hippocampus 
(p < 0.07). No enrichment was observed for cortical neurons 
while, intriguingly, the liver-specific LI insertion cohort exhibited 
enrichment (p <0.11) in genes upregulated in liver versus hippo- 
campus (Figure 7A). Finally, we calculated the significance of 
enrichment for hippocampal neuron LI insertions in genes 
upregulated in hippocampus while incrementally introducing pu- 
tative artifacts described in Figure 5B. We found that statistical 
significance was no longer achieved once the dataset contained 
15% or more artifacts (Figure 7B), hence demonstrating how 
experimental noise reduced in single-cell RC-seq analyses 
would otherwise obscure genome-wide enrichment. These ex- 
periments altogether reveal context-dependent, preferential LI 
mobilization into strongly transcribed loci. 

Noting that euchromatin is also a signature of active 
enhancer elements, we intersected our list of somatic LI inser- 
tions detected by hippocampus bulk RC-seq with an extensive 
FANTOM5 catalog of transcribed constitutive and cell-type spe- 
cific enhancers defined by histone modifications and CAGE- 
delineated transcriptional activity (Andersson et al., 2014). 
Globally, no substantial difference was observed in the rate of 
LI insertions in all enhancers versus random expectation. How- 
ever, of 47 cell-type specific enhancer sets, only neuronal stem 
cell enhancers were significantly enriched for somatic LI inser- 
tions, compared with random expectation (p < 0.01, Fisher’s 
exact test, Bonferroni correction) and compared with the union 
of the remaining 46 cell-type specific enhancer sets (Figure 7C; 
p < 1 .0 X 1 0“^, Fisher’s exact test). This enrichment was highest 
for LI insertions within 1 00 nt of an enhancer, and was observed 
up to 500 nt from defined enhancer boundaries (Figure 7D). 
No enrichment was observed for astrocytes or for other cells 
not of the neuronal lineage, such as hepatocytes (Figure 7D). 
The smaller cohorts of somatic LI insertions detected by sin- 
gle-cell RC-seq and liver bulk RC-seq were insufficient to 
perform meaningful statistical analyses of LI insertional prefer- 
ence with regards to enhancers. Nonetheless, hippocampus 
bulk RC-seq indicated that neuronal stem cell-specific en- 
hancers were the most highly enriched genome functional 
element in absolute terms (1 .8-fold) for somatic LI insertions. 
This reinforced the view that LI mobilization during neurogenesis 
impacts regulatory and protein-coding loci specifically active in 
the hippocampus. 

A Potential Signature of Neurogenic LI Selection 

De novo germline LI insertions can be highly deleterious to gene 
function, and commonly undergo purifying selection (Boissinot 
et al., 2001 ; Han et al., 2004). The LI ORF2 segment of sense ori- 
ented intronic LI insertions particularly hinders RNA polymerase 
processivity (Han et al., 2004; Lee et al., 2012). Hence, while 
sense and antisense intronic LI insertions are assumed to occur 
with equal frequency in the germline, sense insertions are 
selected against more strongly and tend to be eliminated from 
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Figure 7. Genome-Wide Somatic L1 Insertion Patterns 

(A) Somatic LI insertions detected by single-cell RC-seq in hippocampal neurons and glia were enriched in genes differentially upregulated in hippocampus. 
Liver-specific LI insertions detected by bulk RC-seq were moderately enriched in genes upregulated in liver. No enrichment was observed for cortical neurons. 
Color intensity is based on the absolute log 2 transformed p value determined by Fisher’s exact test (Benjamini-Hochberg correction) with blue and orange colors 
representing depletion and enrichment, respectively. Note: in each matrix pairwise comparison, the more highly expressed tissue is on the y axis. 

(B) Hippocampal somatic LI insertions were statistically enriched in genes upregulated in hippocampus versus liver (black) or hippocampus versus heart (gray), 
as shown in (A). However, as previously filtered molecular chimeras (see Figure 5B) were re-introduced into this dataset, enrichment rapidly became no longer 
significant. 

(C) Of the transcribed cell-type specific enhancers defined by FANTOM5, only those of neuronal stem cells were enriched (observed/expected) for somatic LI 
insertions detected by bulk hippocampus RC-seq, compared with other enhancers (p < 1.0 x 10“"^, Fisher’s exact test, Bonferroni correction). 

(D) Somatic LI insertion enrichment in neuronal stem cell enhancers (black) extended 500 bp from enhancer boundaries. No enrichment was observed for 
astrocyte (gray) or hepatocyte (red) enhancers. 

See also Tables S2, S3, S4, S5, and S6. 



the population. It follows that an estimated 43.3% of recent in- 
tronic Ll-Ta insertions are sense oriented, versus only 34.1% 
of fixed Ll-Ta insertions and 39.7% of all polymorphic Ll-Ta in- 
sertions (Ewing and Kazazian, 201 0). By contrast, sense oriented 
intronic LI insertions are not depleted in tumors (Lee et al., 201 2). 
Among the control individuals examined here, we found that, as 
expected, 42/101 (41.6%) of intronic, polymorphic germline LI 
insertions were sense oriented to their host gene. Surprisingly, 
406/1 ,024 (39.6%) of intronic somatic LI insertions detected in 
hippocampal neurons by single-cell RC-seq were also sense ori- 
ented, significantly less than the expected 50% (p < 0.0001, 
exact binomial test). This proportion was 47/136 (34.6%) and 



166/503 (33.0%) for glia and cortical neurons, respectively. 
Adhering to the prevailing germline model of LI evolutionary se- 
lection, we concluded that some somatic LI insertions may arise 
sufficiently early in neurogenesis to impact neural progenitor cell 
fitness, as indicated by a depletion of sense oriented events in 
mature neurons and glia. 

DISCUSSION 

Our experiments firmly establish that LI -driven mosaicism per- 
vades the hippocampus and is mediated by TPRT. That we 
found 13.7 somatic LI insertions per hippocampal neuron was 
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unexpected given a prior estimate of <0.1 insertions per cortical 
neuron (Evrony et al., 2012). By discovering here a myriad of LI 
insertions in cortical neurons, we exclude a biological explana- 
tion for this discrepancy and instead propose that the process 
by which the earlier work selected insertions for validation led 
to a significant underestimate of LI retrotransposition frequency. 
Indeed, the mobilization rate reported here much more closely 
resembles an earlier estimate of 80 somatic LI insertions per 
brain cell, calculated via LI qPCR (Coufal et al., 2009). 

Beyond this, our data demonstrate that LI insertions in hippo- 
campal neurons and glia are preferentially found in protein-cod- 
ing genes highly transcribed in the hippocampus. Transcribed 
enhancers active in neuronal stem cells are also enriched for so- 
matic LI insertions, indicating likely LI perturbation of regulatory 
elements. LI insertions in cortical neurons were however not 
significantly enriched in genes highly transcribed in the cortex. 
We speculate that this could be due to cortical neurogenesis pri- 
marily occurring during fetal development (Spalding et al., 2005), 
which presents a genome-wide transcriptional profile different to 
that of the adult cortex. Although LI mobilization was not 
increased in AGS-1 hippocampal neurons, the pattern of LI in- 
sertions was prospectively different to that of controls, the rea- 
sons for which are presently unclear. The most obvious caveat 
of this analysis is that, due to the extreme rarity of the disease, 
only one AGS patient hippocampus was studied. Nonetheless, 
this experiment serves as a proof-of-principle demonstration 
that single-cell RC-seq could be used in the future to assess 
abnormal LI mobilization in neurological disease. Finally, we 
noted that somatic LI insertions in neurons bore substantially 
longer TSDs on average than polymorphic LI insertions, corrob- 
orated by structural characterization of LI integration sites found 
by single-cell RC-seq. Unusually long TSDs have previously 
been identified using an engineered LI reporter system in 
HeLa cells (Gilbert et al., 2005). As also hypothesized in that 
context, pervasive euchromatinization in neural progenitor cells 
may promote the formation of long TSDs. 

The predominant developmental timing of endogenous LI 
mobilization in the brain remains unclear. Although the vast ma- 
jority of somatic LI insertions detected by single-cell RC-seq 
were found in one cell each, a small proportion of Lis were de- 
tected in multiple cells, including examples found in both glia 
and neurons, indicating LI mobilization in a common multipotent 
progenitor cell. Three somatic LI insertions were validated by 
PCR in multiple neurons, including one example found in nearly 
50% of the neurons assayed. Thus, although most LI insertions 
may occur in one or a handful of neurons, a substantial number 
appear to arise during early neurogenesis. Indeed, the signature 
of potential selection against somatic LI insertions sense ori- 
ented to host gene introns suggests that many retrotransposition 
events precede terminal neural cell maturation. We speculate 
that depletion of these events could be explained by preferential 
LI integration into neurogenesis genes, thereby impacting the 
survival or differentiation potential of neural progenitor cells. It 
also cannot be excluded that somatic LI integration primarily oc- 
curs antisense to host gene introns, though we currently lack a 
mechanistic explanation for this preference. 

Neuronal genome mosaicism may not be restricted to somatic 
LI insertions. Alu and SVA retrotransposons trans mobilized by 



LI may also contribute mosaic insertions. Other than transpos- 
able element activity, recent studies have reported localized 
and chromosome-wide CNV in normal neurons (Cai et al., 
201 4; Gole et al., 201 3; McConnell et al., 201 3). We find no defin- 
itive evidence of these events in our data, though it must be 
noted that our CNV analyses were expressly geared to discern 
genomic deletions caused by WGA failure or variability. How- 
ever, it must be noted that we found consistent WGA inefficiency 
at telomeres, while others have reported that most apparent 
small genomic deletions occur close to telomeres (McConnell 
et al., 2013). 

LI mosaicism may also occur outside of the brain, for 
instance during early embryogenesis (Garcia-Perez et al., 
2007; Kano et al., 2009) or, as we previously reported for a single 
LI insertion, in the liver (Shukla et al., 201 3). However, some cell 
types present practical and technical challenges not posed 
by neural cells. For example, hepatocytes are frequently multi- 
nucleated and sustain aneuploidy and polyploidy, greatly 
complicating single-cell genomic analysis. Thus, although the 
liver-specific LI insertions detected here by bulk RC-seq 
consistently bore LI EN motifs and were enriched in genes 
differentially upregulated in liver, we were unable to corroborate 
these findings with single-cell RC-seq or downstream PCR vali- 
dation. Future methodological advances will therefore likely be 
required to elucidate LI mosaicism in the liver, and elsewhere 
in the body. 

The capacity to locate somatic LI insertions in individual 
neural cell genomes is a major step toward determining whether 
mosaicism impacts neurobiological function. Limitations in as- 
saying the transcriptome and genome of the same cell however 
currently prohibit functional assays of individual somatic LI in- 
sertions. Nonetheless, given the frequency of these events, their 
mutagenic potential for protein-coding and regulatory regions 
and an apparent preference for euchromatic DNA linked to 
neurobiological function, it is not unreasonable to predict that 
LI -driven somatic mosaicism may alter the functional properties 
of the brain. 

EXPERIMENTAL PROCEDURES 

Full protocols can be found in the Extended Experimental Procedures. 

Samples 

Control tissues were provided by the Edinburgh Sudden Death Brain and 
Tissue Bank. Tissues were obtained post-mortem from AGS-1 with ethical 
approval to be used as described. AGS-1 carried SAMHD1 mutations c.646- 
647 delAT (p.Met216fs) and c.1223G>C (p.Arg408Pro). Patient age and 
gender information is provided in Table SI. 

Single-Cell RC-Seq 

NeuN'^ (neuronal) and NeuN7Ki67“ (glial) nuclei were isolated via FACS 
from brain tissue, individually picked under microscope and subjected to 
linear WGA. Products were split into three exponential PCR reactions 
utilizing two different kits, and then combined for library preparation and 
downstream PCR validation. Multiplexed lllumina libraries were pooled and 
sequenced (2 x 150-mer reads) to assess allelic dropout and LI -genome 
junction depletion, then hybridized separately to two LNA probes respectively 
matching the 5' and 3' ends of LI -Ta. Post-enrichment, RC-seq libraries were 
sequenced (2 x 150-mer reads), computationally processed, filtered to 
exclude artifacts, and finally used to call polymorphic and somatic LI 
insertions. 
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5^ L1 -Genome Junction Validation and Characterization 

Twenty somatic L1 insertions detected by singie-ceii RC-seq at a 3' L1- 
genome junction were seiected at random for structurai characterization by 
PCR ampiification and sequencing of the corresponding 5' L1 -genome junc- 
tion. For each exampie, initiai PCR tempiate DNA consisted of WGA materiai 
from the reievant neuron. As the extent of L1 5' truncation was unknown, 
primers oriented antisense to L1 were designed approximateiy every 500 bp 
through the L1-Ta consensus and combined with an insertion site primer 
unique to each iocus. 5' L1 -genome junctions were identified by PCR and 
sequencing and then separateiy PCR ampiified again using WGA materiai 
from the seiected neuron, WGA materiai from other singie neurons from the 
same individuai, as weii as matched buik DNA. Ampiified materiai was stored 
and handied separateiy to buik DNA. 

ACCESSION NUMBERS 

RC-seq and WGS data are avaiiabie from the European Nucieotide Archive 
(ENA) using the identifier PRJEB5239. 

SUPPLEMENTAL INFORMATION 

Suppiementai Information includes Extended Experimental Procedures, six 
figures, six tables, and one data file and can be found with this article online 
at http://dx.d 0 i. 0 rg/l 0.101 6/j.cell.201 5.03.026. 
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SUMMARY 

In vitro modeling of human disease has recently 
become feasible with induced pluripotent stem cell 
(iPSC) technology. Here, we established patient- 
derived iPSCs from a Li-Fraumeni syndrome (LFS) 
family and investigated the role of mutant p53 in 
the development of osteosarcoma (OS). LFS iPSC- 
derived osteoblasts (OBs) recapitulated OS features 
including defective osteoblastic differentiation as 
well as tumorigenic ability. Systematic analyses re- 
vealed that the expression of genes enriched in 
LFS-derived OBs strongly correlated with decreased 
time to tumor recurrence and poor patient survival. 
Furthermore, LFS OBs exhibited impaired upregula- 
tion of the imprinted gene H19 during osteogenesis. 
Restoration of H1 9 expression in LFS OBs facilitated 
osteoblastic differentiation and repressed tumori- 
genic potential. By integrating human imprinted 
gene network (IGN) into functional genomic ana- 
lyses, we found that H19 mediates suppression of 
LFS-associated OS through the IGN component 
DECORIN (DON). In summary, these findings demon- 
strate the feasibility of studying inherited human can- 
cer syndromes with iPSCs. 



INTRODUCTION 

Li-Fraumeni syndrome (LFS) is a genetically heterogeneous 
inherited cancer syndrome characterized by autosomal domi- 
nance and early onset of often multiple independent tumors 
within affected family members (Li and Fraumeni, 1969). In 
contrast to other inherited cancer syndromes predominantly 
characterized by site-specific cancers, LFS patients present 
with a variety of tumor types, including osteosarcoma (OS), 
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soft tissue sarcoma, breast cancer, brain tumor, leukemia, 
and adrenocortical carcinoma. Germline mutations in the TP53 
gene encoding the tumor suppressor p53 are responsible for 
LFS (Malkin et al., 1990). Mutations in p53 usually not only 
abolish normal p53 function but are also associated with addi- 
tional oncogenic activities. Despite the prevalence of p53 muta- 
tions, the simultaneous presence of alterations in other tumor 
suppressors (e.g., RBI and LKB1) and oncogenes (KRAS and 
HER2) makes it extremely difficult to study the specific role of 
p53 in cancer development. LFS provides an ideal genetic model 
system for investigating such a role. Although murine LFS 
models have been generated (Hanel et al., 2013; Lang et al., 
2004; Olive et al., 2004), they do not fully recapitulate the tumor 
spectrum found in LFS patients. Therefore, other model systems 
are needed in order to further decipher mutant p53-associated 
pathogenesis. 

Comprising almost 60% of the common histological bone sar- 
coma subtypes, OS is the most frequent primary non-hemato- 
logical malignancy in childhood and adolescence (Tang et al., 
2008). Despite advances in surgery and multi-agent chemo- 
therapy, the survival rate has not increased in the past 40 years 
as much as for other malignancies. After leukemia, OS is the 
second leading cause of cancer mortality among children and 
adolescents and has been described as a cancer syndrome 
with a differentiation deficiency. OS exhibits osteoblast (OB)- 
like features and sustains undifferentiated OBs (Haydon et al., 
2007). Furthermore, genetic alterations (e.g., p53 mutation and 
RB deletion) are strongly associated with OS development. 
Although the association of TP53 mutation with OS is strongly 
supported by the high risk of OS in LFS patients (Porter et al., 
1 992), the underlying mechanism by which triggers OS develop- 
ment is still unclear. 

HI 9 is a maternally imprinted gene encoding a long non-cod- 
ing RNA (IncRNA). Alterations in the expression of genes in the 
H19-IGF2 imprint locus are linked to both Beckwith-Wiedemann 
syndrome (BWS) and Russell-Silver syndrome (RSS) (Choufani 
et al., 2010; Eggermann, 2010). Gain of methylation of the up- 
stream HI 9 imprinting center (101) leading to HI 9 inactivation 
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and IGF2 activation is found in 5%-10% of BWS patients and in 
>25% of patients with Wilms tumor, hepatoblastoma, and rhab- 
domyosarcoma (Choufani et al., 2010). Although the H19-IGF2 
imprinting mechanism has been well documented and serves 
as a paradigm for the study of epigenetic regulation, the func- 
tions of FI19 in biological and pathological molecular regulatory 
processes remain nebulous. Recently, Varrault and colleagues 
meta-analyzed the set of strongly correlated genes in microarray 
data sets to infer the “Imprinted Gene Network” (IGN), of 
which HI 9 is a member. This IGN may be part of the complex 
regulatory system that induces rapid but controlled growth dur- 
ing development (Varrault et al., 2006). HI 9 has been suggested 
to regulate embryonic growth and differentiation by controlling 
the expression of IGF2 and several other interconnected im- 
printed genes; thus, fine-tuning equilibrium of growth activation 
and repression (Gabory et al., 2009). These findings suggest 
that HI 9 may execute its biological functions through the IGN. 

Modeling human genetic diseases has been facilitated by 
induced pluripotent stem cell (iPSC) methodologies (Takahashi 
et al., 2007; Takahashi and Yamanaka, 2006; Yu et al., 2007). 
Although iPSCs are widely utilized in the study of various genetic 
diseases with either Mendelian or complex inheritance, their 
application in cancer research has been much less extensively 
explored. In the present study, we have modeled LFS-associ- 
ated OS by using OBs derived from LFS patient-specific iPSCs 
and were able to recapitulate disease characteristics. The LFS 
iPSC-derived OBs displayed a clear OS gene expression signa- 
ture whose particular transcriptional spectra strongly correlate 
with clinical prognosis. By integrating global transcriptional and 
computational analyses, we demonstrated that downregulation 
of HI 9 and its associated IGN component DECORIN (DON) 
is responsible for LFS-associated OS development. Restoring 
HI 9 expression facilitates OB differentiation and inhibits tumor- 
igenesis. Downregulation of DON impairs H19-mediated osteo- 
genic differentiation and tumor suppression. In summary, our 
results suggest that p53 mutation-mediated HI 9 and IGN inacti- 
vation may contribute to OS development in LFS patients and 
that induction of HI 9 expression may have important implica- 
tions for the future treatment or prevention of LFS-associated 
OS and/or OS with somatically acquired p53 mutations. 

RESULTS 

Generation and Characterization of LFS iPSCs 

To elucidate how p53 mutation results in tumor development, we 
generated iPSCs from patient fibroblasts obtained from a LFS 
family representing three LFS patients and two unaffected indi- 
viduals (Figure SI A). The three patients have a heterozygous 
c.734G>A mutation that causes a G245D missense substitution. 
This site is one of the hot-spot p53 mutations in both LFS pa- 
tients and somatic tumors (Varley, 2003). These patients present 
with a broad spectrum of tumors, including OS, neurilemmoma 
and astrocytoma (Figure SI A). The fibroblast samples displayed 
a normal karyotype under low passage (Mirzayans et al., 2010). 
Genome sequencing further confirmed heterozygous G245D 
mutations in LFS fibroblasts (Figure SI B). Using non-integrating 
Sendai virus (SeV)-based delivery of the four Yamanaka reprog- 
ramming factors, OCT4, SOX2, KLF4, and c-MYC (Fusaki et al.. 



2009; Takahashi et al., 2007), we established a number of iPSC 
clones from the affected and unaffected family members. These 
iPSC clones all demonstrate hESC morphology and express 
pluripotency factors (NANOG, SOX2 and OCT4) and surface 
markers (TRA-1-81 and SSEA4) and alkaline phosphatase (Fig- 
ure 1 A). The lines also show expression of pluripotency markers 
at levels comparable to H9 and HES2 hESCs by quantitative (q) 
RT-PCR and have a more open and demethylated OCT4 pro- 
moter than the original fibroblasts (Figures 1 B and 1C). We veri- 
fied loss of SeV and exogenous OCT4, SOX2, KLF4, and c-MYC 
transgenes (Figures SIC and SID), demonstrating that these 
iPSCs are zero-genetic footprint. Importantly, the iPSC lines 
were karyotypically normal (Figure S1E) and demonstrated the 
capacity to differentiate into all three germ layers in vitro (data 
not shown) and in teratomas (Figure ID). All characterizations 
of wild-type (WT) and LFS iPSCs are summarized in Table SI. 
Together, these data indicate that somatic cells from LFS pa- 
tients can be properly reprogrammed, maintain a pluripotent 
state and can be effectively differentiated. 

Impairment of p53 Function in LFS iPSC-Derived 
Mesenchymal Stem Cells 

As mentioned previously, OS, notably featuring defective OB dif- 
ferentiation, is one of the major cancers affecting this LFS family. 
Therefore, we applied our iPSC model to study how mutant p53 
interferes with OB differentiation and to investigate the molecular 
alterations caused by p53 mutation in OS development. Human 
OBs can be induced from hESC-derived multipotent mesen- 
chymal stem cells (MSCs) that can give rise to bone, cartilage, 
muscle, and adipose tissues. We first differentiated WT and 
LFS iPSCs to MSCs by treating them with FGF2 and PDGF-AB 
and sorting CD105VCD24“ cells (Figure S2A). These cells also 
expressed the MSC surface markers CD44, CD73, GDI 05, and 
GDI 66, and the MSC-related transcription factor SNAI1 as well 
as VIM (Figures 2A and S2B). The cells could be maintained for 
2 months without loss of their MSC characteristics (Figure S2C) 
(Lian et al., 2007). In comparison with WT MSCs, LFS MSCs 
showed no mRNA expression differences of p53, MSC-associ- 
ated transcription factors, and osteoblastic-associated factors 
(Figures 2B, right, S2D and S2E). Nevertheless, LFS MSCs 
showed lower mRNA expression levels of p53 targets p21 and 
MDM2 (Figure 2B, left and middle). Compared with p53(WT), 
p53(G245D) showed reduced binding to the p21 and MDM2 
promoters by chromatin-immunoprecipitation (ChlP)-PCR anal- 
ysis (Figure 2C), consistent with impaired transcriptional activity 
(Figure 2B). Upon MDM2 inhibitor Nutlin-3 treatment, expression 
of numerous p53 target genes (p21, MDM2, SFN, NOXA, FAS, 
TNFRSF10B, and GADD45A) were upregulated in WT MSCs, 
but this effect was blunted in LFS MSCs (Figure 2D). All charac- 
teristics of WT and LFS MSCs are summarized in Table SI . These 
studies demonstrate that LFS MSCs not only maintain MSC char- 
acteristics identical to WT MSCs but also retain the defective p53 
function of the parental fibroblasts (Barley et al., 1998). 

Recapitulating OS Characteristics in LFS 
MSC-Derived OBs 

Since it was previously suggested that impairment of p53 func- 
tion leads to OS (Walkley et al., 2008) and clinical OS samples 
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Figure 1. LFS iPSCs Are Pluripotent 

(A) SeV-4F (OCT4, SOX2, KLF4 and c-MYC) reprogrammed LFS and wild-type (WT) iPSCs derived from the same family express hESC pluripotency factors, 
hESC surface markers, and AP activity. 

(B) qRT-PCR assay for expression of endogenous human NANOG, SOX2, OCT4, DPPA4, REX1 , and TERT\n IPSCs and parental fibroblasts. PCR reactions are 
normalized to GAPDH and plotted relative to expression levels in human H9 ESCs. Error bars indicate ± SEM of triplicates. 

(C) Bisulfite sequencing analysis of the OCT4 promoter showing CpG hypomethylation in WT and LFS IPSCs relative to the parental fibroblasts. The cell line and 
percentage of CpG methylation are indicated to the left of each cluster. Closed circle, methylated CpG; open circle, unmethylated CpG. 

(D) In vivo teratoma formation assay demonstrates LFS iPSC capacity to differentiate into the three germ layers. H&E-stained teratomas containing embryonic 
tissues all three germ layers, including enamel epithelium (endoderm); neural tube epithelium (ectoderm); cartilage corpuscle (mesoderm). Scale bar, 100 lam. 
See also Figure SI and Table SI . 
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Figure 2. Defective p53 Activity in LFS MSCs 

(A) Surface antigen profiling of LFS MSCs by flow cytometry demonstrating the CD24“, CD105'^, and CD44'^ fractions in differentiated MSCs. 

(B) qRT-PCR analysis for expression of p53 and its downstream target genes, p21 , and MDM2, in the WT MSC group (1 1 lines) and the LFS MSC group (1 3 lines). 
The p53 mRNA levels do not show a significant difference between WT and LFS MSCs, while levels of p21 and MDM2 are significantly lower in LFS MSCs. 

(C) ChIP-PCR demonstrating lower p53 binding affinity to p21 and MDM2 promoter region in LFS MSCs. IgG ChIP is a negative control. Error bars indicate ± SEM 
of triplicates. 

(D) qRT -PCR for expression of p53 and its target genes after treatment of LFS and WT MSCs with the MDM2 inhibitor Nutlin-3 for 6 hr. Upregulation of the majority 
of p53 target genes is impaired in LFS MSCs in comparison with WT MSCs despite similar p53 expression in both MSC groups. 

qRT-PCR data are represented as mean ± SEM; n = 3. See also Figure S2 and Table S1 . 
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Figure 3. LFS OBs Show Differentiation Defects and Oncogenic Properties 

(A and B) Both AP (A) and alizarin red S (B) staining reveal the attenuation of OB differentiation in LFS OBs. 

(0 and D) Expression of OB lineage markers (C) and transcriptional regulators (D) during the OB differentiation time course. 



(legend continued on next page) 
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are largely composed of poorly differentiated or undifferentiated 
OBs (Tang et al., 2008), we asked if dysregulation of p53 sig- 
naling is responsible for the observed differentiation defects. 
LFS MSCs were induced to the OB lineage and the differentiation 
process was monitored over time. Several p53 targets were 
gradually induced in WT but not LFS OBs during differentiation 
(Figure S3A). Consistently, ChIP-PCR showed significantly 
reduced p53 binding to the p21 and MDM2 promoters in LFS 
OBs (Figure S3B). The decrease in p53 transcriptional activity 
in late stage osteogenic differentiation (day 17) was confirmed 
by a p53 reporter assay (Figure S3C). These findings suggest 
that p53 signaling is active in WT OBs but impaired in LFS 
OBs. AP staining for detecting bone-associated ALPL enzyme 
activity (Figure 3A) and alizarin red S staining reflective of mineral 
deposition by functional OBs (Figure 3B) showed slower differ- 
entiation in LFS MSCs. Mineral precipitations were observed 
on the surface of Petri dishes in WT but not LFS OBs (Figure S3D). 
Consistently, in comparison with WT OBs, LFS OBs showed 
lower expression of COL1A1 and ALPL (pre-OB markers), 
BGLAP/Osteocalcin, and PTH1R (mature OB markers), as well 
as FGF23 and MERE (osteocyte markers) during osteogenesis 
(Figures 30). Immunostaining confirmed that LFS OBs ex- 
pressed lower BGLAP than WT OBs did (Figure S3E). Because 
osteogenic differentiation is controlled by several core trans- 
criptional/epigenetic regulators, we monitored their expression 
levels during OB differentiation from LFS MSCs. Indeed, we 
found impaired upregulation of ZNF521 and ZEB1 (Figure 3D), 
indicating a defect in the normal OB gene regulatory network. 
Knockdown of p53 resulted in upregulation of osteogenic mar- 
kers in LFS OBs and eliminated the osteogenic differentiation 
defect (Figure S3F) indicating that p53(G245D) may exert gain- 
of-function instead of loss-of-function effects in inhibiting osteo- 
genic differentiation. Moreover, we noticed LFS OBs growing in 
randomly oriented piled-up foci rather than the two-dimensional 
monolayers of flattened cells (Figure S3G), a generally regarded 
manifestation of a transformed phenotype and an initiating 
step in tumorigenesis. To further investigate whether LFS OBs 
are able to recapitulate tumorigenic potential, we performed 
in vitro anchorage-independent growth (AIG) assays and in vivo 
xenografts. AIG assays showed clonal growth in soft agar by 
many LFS OBs (6 out of 9) but not in WT OBs, undifferentiated 
LFS or WT MSCs (Figure 3E). Performing xenografts in nude 
mice, we found tumorigenic ability in LFS OBs but not WT OBs 
(Figure 3F, right). The tumors demonstrated immature OB char- 
acteristics including AP activity, collagen matrix deposition but 
not mineralization (Figure 3F, left). The lack of in vitro and in vivo 
tumorigenic ability in LFS MSCs implies that OS may originate 
from immature or poorly differentiated OBs rather than MSCs 
(Figures 3E and 3F). To examine whether LFS OBs are able to 
gain malignancy during tumor progression, we performed serial 



transplantation using an in ovo chick embryo chorioallantoic 
membrane (CAM) model. As shown in Figure S3H, the tumor 
sizes of LFS OBs increased in the second transplantation in 
comparison with the first transplantation. These results imply 
that similar to OS cells, LFS OBs contain a population of potential 
tumor-initiating cells (TICs) and these cells are enriched during in 
ovo primary transplantation and fuel secondary tumor growth. 
Interestingly, no additional increases in malignant tumor growth 
were seen following a tertiary transplantation. The persistence of 
a WT p53 allele during serial CAM transplantations (Figure S3I) 
provides a possible explanation for why there is no further gain 
of tumor growth in the tertiary transplantation. Taken together, 
these findings demonstrate that OS-related phenotypes (defec- 
tive OB differentiation and tumorigenic ability) can be recapitu- 
lated in LFS iPSC-derived OBs. 

OS Spectrum Is Represented in LFS OBs 

In order to gain insights into LFS-associated osteogenic defects 
and tumorigenesis, the global transcriptome was investigated 
by mRNA-seq during OB differentiation time courses (Table 
S2). Expression profiles of LFS and WT time course samples 
analyzed by Spearman’s rank correlation demonstrated that 
gene expression profiles of LFS samples clustered together 
but were distinct from WT samples (Figure 4A). The non-nega- 
tive matrix factorization (NMF) method for extracting relevant 
biological correlations based on gene expression data showed 
that at day 0 LFS and WT MSCs clustered together, while at 
days 7, 14, 17 differentiating LFS OBs are distinct from their 
WT counterparts (Figure 4B). These results suggest that WT 
and LFS MSC gene expression profiles are initially similar but 
diverge during subsequent OB differentiation. Alignment of 
reads at individual gene loci and quantification by fragments 
per kilobase of exon per million fragments mapped (FPKM) 
values confirmed the gradual increase of OB marker ALPL and 
skeletal development regulators HOXA10, IGF2, and CLEC3B 
in WT but not in LFS-derived cells (Figure S4A). Gene Ontology 
(GO) analyses using Network2Canvas further revealed that 
OB differentiation in WT MSCs (day 1 7 versus day 0) affects bio- 
logical process genes mainly involved in skeletal system devel- 
opment and cell motility, whereas genes upregulated in LFS 
MSCs are primarily associated with an inflammatory response 
(Figure 40, upper and middle). Expression levels of several skel- 
etal system development-related genes were greatly increased 
during the WT osteogenesis time course but not in the LFS sam- 
ples (Figure 40, bottom). Moreover, expression levels of genes 
involved in positive regulation of cell differentiation and negative 
regulation of cell proliferation were significantly increased in WT 
OBs. In contrast, genes involved in positive regulation of cell 
cycle and mitosis were enriched in LFS OBs (Figure 40, upper 
and middle). Using the Mouse Gene Atlas database, genes 



(E) In vitro AIG assay for tumorigenicity demonstrates colony numbers found in LFS OBs but not in WT OBs. Positive colonies after 1 month growth of differ- 
entiated OBs in either MSC or OB differentiation media are those larger than 50 i^m (scale bar, 50 ^im). 

(F) Tumor xenograft experiments by subcutaneous injection in NU/NU mice demonstrate that LFS OBs but not MSCs recapture in vivo tumorigenic ability. The 
LFS2-B OB-derived tumors were examined by H&E, AP, picrosirius red, and von Kossa stains to examine morphology, bone-associated AP expression, collagen 
production, and mineral deposits, respectively. Error bars represent + SEM. Scale bar, 1 cm. 

(C-E) Error bars represent ± SEM; n = 3. 

See also Figure S3 and Table S1 . 



Cell 161 , 240-254, April 9, 2015 ©2015 Elsevier Inc. 245 




Cell 




i> i> i> 
Q. Q. Q. 

QJ £U - 



Q. Q. C 
CD £U ~ 






■vj O O o -vj ""vj 



□□ > > > ro 03 Spearman's Rank 
'q. 'q_ 'q. 'q. 'q. 'q. Correlation 

QJ QJ Q) QJ £U QJ 

^ Coefticient 






LFS2-B_day_14 

LFS2-B_day_17 

LFS1-A_day_14 

LFS1-A_day_17 

LFS1-A_day_7 

LFS2-B_day_7 

LFS2-B_day_0 

LFS1-A_day_0 

WT1-A_day_0 

WT1-A_day_7 

WT1-A_day_14 

WT1-A_day_17 







Samples MSC_0SB K=4 



•vj ^ >vj 4^ 



— 3 



2 



■ 2 - 



• Skeletal System Development 

Positive Regulation 
• • • of Cell Differentiation 



Negative Regulation 
of Cell Proliferation 

WTOBs 

Ranking of G0_ Biological Process 



I Inflammatory Response 

* • ^ Positive Regulation 
• of Cell Cycle 

*•••• 

•••••••« 

Positive Regulation 



1 - 



of Mitosis 



LFS OBs 



Ranking of G0_ Biological Process 
LFSl-A LFS2-B WTl-A 



-2 NES +2 
day 

Enriched in OBs 
Enriched in OS 



WT OBs LFS OBs 

7 14 17 7 14 17 




Enriched OS-associated Genes 



day 14 
NES=-1.48 
NOM p-val=0.00 
FDRq-val=0.00 

HI I 1 1 II I 



High in WTOBs High in LFS OBs 

Enriched OB-associated Genes 





High in WT OBs 



High in LFS OBs 



E 





000 

• • • 


wo 


K-Means OOO 

Cluster n n n 


000 

• • • 


000 


000 


000 


Enriched 


LFS OB Expression Genes 




Group I (22) Group II (15) 




Log-rank Test p=0.009 




day 0 7 14 17 0 7 14 17 0 7 14 17 




PTHIR 

MSX2 

PRELP 

FGFRl 

FRZB 

CMKLRl 

TWISTl 

COL9A2 

NPR3 

H0XA13 

ACAN 

COMP 

IGFBP4 

ALPL 

ANKH 

IGF2 

CLEC3B 



F 



rr) 

rororsim LorsicD^H"^ 

.H.HrNrn^rNrNrOfNfN^Lnfr,^fr,^rNfr, 

rH rH rH PsJ CO CO rH PsJ rO fNI r—\ 

^^rsirNi^HrsirH^^rsi CQ.Q.Q.O'O'O'O' 
Q.O'O'O'Q.Q.Q.O'O'O'OTHrNir^cnrNrNrNi 
^^^>:4-cDcDr^r^oooo^^^^^rNirNirNi 



LFS OBs vs. WT OBs 
LFS OS vs. WT OBs 
LFS OS vs. LFS OBs 
OS vs. DiffOB 
SS vs. Muscle 




DOWN UP 
I I Insignificant 



Figure 4. Genome-Wide Transcriptome Analysis Reveals that LFS OBs Possess an OS Signature 

(A) Correlation matrix of LFS and WT osteogenic time course mRNA-seq results. 

(B) Ordered tree linkage displaying sample clustering and metagenes representing the most variability associated with each differentiation transition. 

(legend continued on next page) 
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upregulated at day 1 7 in WT OBs were more similar to the later- 
stage differentiated mouse OB gene profiles than were LFS OBs 
(Figure S4B). Further analyses of day 17 WT and LFS OB gene 
expression revealed that the expression pattern in WT OBs is 
similar to that of mouse OBs at day 21 . In contrast, the gene 
expression pattern of LFS OBs is closest to that of mouse 
OBs at day 5 (Figure S4C). Consistent with our qRT-PCR ana- 
lyses of p53 target gene expression (Figure S3A), gene set 
enrichment analysis (GSEA) confirmed that many known and 
predicted targets enriched in WT OBs relative to LFS OBs are 
found disproportionately in the set of significantly (>3-fold) upre- 
gulated genes in p53 transduced OS cells (Figure S4D). This 
supports the dysregulation of p53 function that occurs during 
LFS OB differentiation. 

We next asked if the gene expression in LFS OBs is consistent 
with an oncogenic signature derived from OS cell lines. We iden- 
tified both OB and OS signature genes by using GenePattern to 
compare the gene expression profiles between OB cells and OS 
lines (GSE39262). We then applied GSEA to compare enriched 
expressed genes between WT and LFS OBs against these signa- 
tures. As shown in Figure 4D, OS-associated genes are specif- 
ically enriched in LFS OBs; in contrast, OB-associated genes 
are specifically enriched in WT OBs. This finding strongly sug- 
gests that by the time of their generation, LFS OBs have already 
acquired OS characteristics. Furthermore, to determine whether 
the gene expression profiles of LFS OBs can potentially have 
prognostic value as measured by patient survival and tumor 
recurrence, we performed Kaplan-Meier analyses restricted to 
two separate subsets of patients, those who had or did not 
have an enriched LFS iPSC-derived OB gene expression signa- 
ture as defined by enriched genes in LFS OBs versus WT OBs at 
day 17. We found that the enriched LFS OB-associated gene 
signature was significantly correlated with more rapid tumor 
recurrence and poorer survival (p = 0.004 and p = 0.009, respec- 
tively) (Figure 4E). In summary, our LFS iPSC model not only re- 
captures the OS signature but can also predict clinical outcomes 
in patients. 

Cytogenetic analyses of human OS have revealed numerous 
genomic alterations and rearrangements (Batanian et al., 2002; 
Bridge et al., 1993) that have been consistently replicated in a 
murine p53 conditional knockout OS model (Walkley et al., 
2008). Since genomic alterations are common during cancer 
progression, it has been challenging to factor out their effects 
when attempting to interrogate the roles of tumor suppressor 
genes or oncogenes in tumor progression. To investigate if 
our LFS OBs provide a unique model to study early stages of 
tumor progression, we applied in silico cytogenetic region 
enrichment analysis (CREA) to our LFS OB samples to identify 
the potential presence of rearranged regions commonly found 
in human OS. The LFS OBs were compared with the gene 
expression signature of WT OBs at day 17 of OB differentiation. 



while the synovial sarcoma (SS) was compared with normal 
muscle. As expected, OS cell lines showed significant enrich- 
ment at 10 of the 18 regions with known cytogenetic alterations 
in OS. In contrast, human SS were generally not associated with 
any alterations in these regions (Figure 4F), suggesting that 
these chromosomal regions are a specific feature of OS rather 
than a common feature of other cancers. Notably, in compari- 
son with WT OBs, both LFS OBs and tumors showed negligible 
enrichment in these regions (1 and 2 out of 18 regions, respec- 
tively), implying that chromosomal rearrangements barely occur 
in these LFS OB-derived tumors (Figure 4F). These results 
demonstrate that LFS iPSC-derived OBs can serve as a useful 
system to study the early stages of OS progression caused 
solely by p53 mutation without interference by secondary 
genomic alterations. 

Impaired HI 9 Expression in LFS-Associated OS 

Among 421 differentially expressed genes identified in compar- 
isons between WT and LFS OBs, HI 9, highly expressed in OBs 
but not in bone marrow or osteoclasts (Figure S5A), warranted 
further in-depth analyses. Alignment and quantification of reads 
at the HI 9 locus and qRT-PCR showed HI 9 upregulation in WT 
but not LFS OBs during osteogenesis (Figures 5A and 5B). The 
low expression of HI 9 in LFS OBs was further confirmed in 
multiple LFS iPSC-derived OBs (Figure 50). In comparison with 
bone/OB tissues and p53 WT cells, HI 9 expression is signifi- 
cantly decreased in OS and p53 mutant cells, respectively 
(Figures 5D and 5E). These findings suggested that HI 9 dysre- 
gulation is a common phenomenon in OS and is correlated 
with p53 status. RNAi-mediated knockdown of HI 9 in WT OBs 
led to decreased expression of osteogenic factors ZEB1 and 
ZNF521, pre-osteoblastic makers ALPL and COL1A1 as well 
as AP activity (Figure 5F). Supporting the positive regulatory 
role of HI 9 in osteogenesis, ectopic expression of HI 9 in LFS 
MSCs resulted in increasing osteogenic marker expression 
and reactivation of OBs with consequent mineral deposition (Fig- 
ure 5G). Moreover, AIG and oncosphere assays demonstrated 
that in vitro tumorigenic activities of LFS OBs and OS TICs 
were suppressed by re-expressing HI 9 (Figures 5H and 51). In 
LFS OBs assayed in ovo with the CAM assay and in vivo by 
the mouse tumor xenograft model, restoration of HI 9 expression 
not only reduced the incidence of tumor development but also 
decreased tumor size (Figures 5J and 5Kj. To further investigate 
if restoration of HI 9 has any therapeutic potential for OS treat- 
ment, HI 9 was transduced into OS cell lines, OSA and HOS. 
As shown in Figure S5B, HI 9 reduced the incidence of OS tumor 
development as well as tumor size, suggesting HI 9 as a thera- 
peutic target. In comparison with the original OSA tumor, the 
H19-transduced OSA tumor demonstrated poorly differentiated 
osteoblastic characteristics including positive AP activity and 
collagen matrix deposition but not mineralization, implying that 



(C) GO biological processes associated with upregulated genes in WT and LFS OBs at day 1 7. FPKM values of skeletal system developmental genes are plotted 
as a heat map demonstrating their upregulation during WT but not LFS-derived osteogenic differentiation. 

(D) GSEA indicates that OS-associated genes are enriched in LFS OBs while normal OB-associated genes are enriched in WT OBs. 

(E) OS patients with the LFS OB signature show shorter tumor recurrence and poorer survival. 

(F) CREA analysis reveals chromosomal integrity of LFS OBs and tumors engrafted in nude mice. 

See also Figure S4 and Tables S2 and S6. 
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due to heterogeneity, a small portion of OS may gain of the ability 
to escape H19-induced OB differentiation and tumorigenic sup- 
pression. In summary, these findings emphasize the essential 
role of H19 in regulating osteogenesis and the potential of re-ex- 
pressed H19 to rescue defective osteogenesis and suppress tu- 
mor grow in both LFS iPSC-derived OBs and OS cell lines. 

Since H19 expression has been shown to be suppressed by 
p53 (Dugimont et al., 1998), we asked if this regulation was found 
in LFS patients with mutant p53. Consistent with the previous 
findings, activation of p53 by Nutlin-3 treatment decreased 
FI19 expression by 20%-39% (Figure S5C) and RNAi-mediated 
p53 knockdown by two distinct RNAi molecules resulted in a 
1.4- to 2.4-fold increase of HI 9 expression in WT MSCs (Fig- 
ure S5D). In contrast, while suppression of HI 9 expression 
was not detected in Nutlin-3 treated LFS MSCs (Figure S5C), 
RNAi-mediated p53 knockdown led to a significant increase 
in HI 9 expression (7.1- to 19.3-fold) (Figure S5D), strongly sug- 
gesting that p53 mutants, at least p53(G245D), exert a gain-of- 
function effect in repressing HI 9. In agreement with the hypoth- 
esis that p53 does not bind the H19 promoter region and that 
p53-mediated HI 9 repression occurs through other factors 
(Dugimont et al., 1998), p53 ChIP showed no enrichment of 
p53 binding in comparison with a control IgG pull-down (Fig- 
ure S5E). To further explore whether other p53 mutants could 
promote this regulation, we transfected multiple variants of 
p53 into WT MSCs and examined their effects on HI 9 expres- 
sion. As with p53(G245D), many p53 hotspot mutants (R175H, 
G245S, G248W, and R280K) exhibited stronger inhibition of 
HI 9 expression than did WT p53 (Figure S5F). This result demon- 
strates not only that the suppression of HI 9 expression by p53 
mutants is common in LFS-associated OS but also that this is 
a general mechanism found in other LFS patients with distinct 
p53 mutations. Since it was shown that the tumorigenic ability 
of several p53 mutants is at least partially accounted for by their 
interaction with and inhibition of p63 and/or p73 functions (Di 
Como et al., 1999; Gaiddon et al., 2001), we investigated 
whether p53(G245D) could inhibit HI 9 expression directly via 
p63 and p73. Exogenous coimmunoprecipitation (colP) showed 
that p53(G245D) and p53(R175H) but not p53(WT) can interact 
with p63 and p73 (Figure S5G). Depletion of p53(G245D) by 
siRNA increased p21mini promoter activity in two LFS-derived 
cells, LFS2-B (AWT)-1 and LFS2-B(AWT)-2 MSCs that lacked 
a functional wild-type p53 allele due to insertion of a Neo^ selec- 
tion marker (D.-F. Lee and I.R. Lemischka, unpublished data). 



Both p63 and p73 were able to activate the p21mini promoter 
to a greater extent upon p53(G245D) knockdown (Figure S5H), 
confirming the dominant-negative activity of p53(G245D) in 
regulating p63 and p73 function. However, ectopic expression 
of p63 and p73 does not alter HI 9 expression in WT MSCs 
(Figure SSI). These findings suggest that although one of the 
p53(G245D) gain-of-function effects in regulating LFS CS path- 
ogenesis is through the suppression of normal p63/p73 function, 
this regulation is not involved in HI 9 transcriptional regulation. 

Recent studies have found that p53 status may affect DNA 
methylation in the H19 genomic locus in a clone-specific manner 
in iPSCs (Yi et al., 2012). To investigate whether impaired upre- 
gulation of HI 9 in LFS CBs is caused by hypermethylation on 
the H19 locus, differentiating LFS CBs were treated with the de- 
methylating agent 5'aza-deoxycytidine (Decitabine). As shown in 
Figure S5J, HI 9 expression in LFS CBs was slightly increased 
upon Decitabine treatment but remained significantly lower 
than in WT CBs during CB differentiation, ruling out the possibil- 
ity that lower HI 9 expression in LFS CBs is due to HI 9 locus 
methylation. In fact, Decitabine-treated LFS CBs showed 
impaired CB differentiation ability (Figure S5K) and slightly 
increased in vitro tumorigenic potential (Figures SSL and S5M), 
suggesting that the clinical application of demethylating re- 
agents, at least Decitabine, in treating LFS patients with CS is 
unlikely to provide much benefit. 

Involvement of Human IGN in Osteogenesis and 
Neurogenesis 

Mouse HI 9 controls cell growth and development by regulating 
the expression of several imprinted genes within the IGN (Gabory 
et al., 2009). Interestingly, in comparison with other bone-asso- 
ciated tissues, the expression of mouse imprinted genes is en- 
riched in differentiated CBs and many of these are themselves 
members of the IGN (9 out of 15; i.e., HI 9, Ndn, Igf2, Peg3, 
Zaci , Sgce, DIkl , Mest, and Cdknic) (Figure S6A). Additionally, 
in comparison to normal mouse CBs, these imprinted genes, 
including HI 9, are significantly downregulated in mouse CS 
(Figure S6B). These results imply that the IGN may have a role 
in osteogenesis and that its dysregulation may promote CS in 
mice. We hypothesized that HI 9 suppresses LFS-associated 
CS through the imprinted gene regulatory system. Since a hu- 
man IGN was not yet established, we first searched for genes 
frequently coexpressed with human imprinted genes and built 
a human IGN from a database of 79 human tissues (178 arrays) 



Figure 5. Involvement of H19 in LFS OB-Associated Defective OB Differentiation and Tumorigenesis 

(A) Visualization of mRNA-seq short read mapping of H19 in UCSC Genome Browser. 

(B) qRT-PCR shows impaired upregulated H19 in LFS during osteogenesis. Error bars indicate ± SEM; n = 3. 

(C) Multiple LFS OBs have impaired upregulation of H19 during OB differentiation. 

(D and E) H19 expression is notably decreased in OS and p53 mutant cells. The analyses were performed using microarray data from GEO dataset GSE36001 . 

(F) RNAi-mediated knockdowns of H1 9 in WT MSCs leads to decreased expression of osteogenic genes as well as AP activity. qRT-PCR data are represented as 
mean ± SEM; n = 3. 

(G) Ectopic expression of H19 in LFS MSCs increases osteogenic gene expression and facilitates OB maturation. qRT-PCR data are represented as mean ± SEM; 
n = 3. Scale bar, 100 i^m. 

(H and I) AIG (H) and oncosphere (I) assays show repressed in vitro tumorigenic ability of LFS OBs upon restoration of HI 9. Error bars are ± SEM; n = 3. Scale 
bar, 100 i^m. 

(J) In ovo CAM assay indicates HI 9 suppresses tumorigenic ability of LFS OBs. 

(K) In vivo tumor xenograft experiments indicating HI 9 suppression of tumorigenic ability of LFS OBs. Error bars represent ± SEM. 

See also Figure S5. 
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Figure 6. A Network of Coregulated Human Imprinted Genes 

(A) Genes linked to human imprinted genes, including H19, identified from a set of 79 human tissues. Fifty-two imprinted genes are extracted from total 63 known 
and putative human imprinted genes. The main human IGN consists of 16 imprinted genes and contains two sub-networks. 

(B) Network2Canvas analysis of these 16 imprinted coregulated genes by GO biological processes reveals that DCN-associated IGN primarily participates in 
osteogenesis and NDN-associated IGN in neurogenesis. 

See also Figure S6 and Table S3. 



(human U133A/GNF1H Gene Atlas; GSE1133). Fifty-two im- 
printed genes were extracted from total of 63 known and puta- 
tive human imprinted genes (http://www.geneimprint.com/). 
The main human IGN is composed of 16 imprinted genes and 
divided into two sub-networks, the DCN and the NDN sub-IGN 
(Figure 6A). The DCN sub-IGN includes four imprinted genes 
(H19, DCN, IGF2 and CPA4) and the NDN sub-IGN contains 12 
imprinted genes {NDN, PEGS, NNAT, MEGS, GNAS, MAGI2, 
C0PG2IT1, GRB10, DLGAP2, SNRPN, NTM and BLCAP). Tis- 
sue-specific gene expression defines their unique biological 
roles, and the enriched expressed genes, in general, have a 
role in maintaining tissue or cell-specific functions. Imprinted 
genes have been suggested to execute their functions through 
modulating the expression of their coregulated genes. Accord- 
ingly, we analyzed coregulated gene expression associated 
with these 16 imprinted genes (Table S3) using the Mouse 
Gene Atlas database. Interestingly, we found that the coregu- 
lated genes associated with individual imprinted genes in the 
DCN sub-IGN are primarily implicated in osteogenesis. In 
contrast, the coregulated genes associated with the NDN sub- 
IGN are mainly implicated in neurogenesis (Figure 6B). These 
findings strongly indicate that although individual imprinted 
genes participate in more diverse biological functions, the main 
roles of the entire IGN are in osteogenesis and neurogenesis. 
Gene expression analyses of 318 FI19 coregulated genes sug- 
gest that these may function in early OB differentiation as well 
as lung, lactating mammary gland, and ciliary body development 
(Figure S6C). Interestingly, expression of these coregulated 
genes is mainly enriched in early but not late OB differentiation. 
This led us to suspect that HI 9 may function as an initiator of 



osteogenesis and those downstream molecules, including core- 
gulated imprinted genes, may function during later osteogenic 
stages. 

DCN functions downstream of HI 9 in LFS-Associated 
OS Development 

We next asked whether genes in the H19-associated IGN are 
responsible for the observed H19-modulated OB differentiation 
and oncogenic repression. We noticed that the imprinted gene 
DON, encoding a matrix proteoglycan, is directly linked to HI 9 
in the IGN (Figure 6B) and that DGN-coregulated genes are rela- 
tively overexpressed in OBs in a gradually increasing pattern 
during osteogenesis (Figure 7A). Because we also realized that 
the biological functions of DGN-coregulated genes are mainly 
in cell adhesion and the cell matrix (Figure 7B), both of which 
are known to regulate OB differentiation (Sosa-Garcia et al., 
2010), we pursued this candidate further. Alignment and quanti- 
fication of reads at the DCN locus by FPKM values and qRT-PCR 
showed that DCN is gradually upregulated in WT but not LFS 
OBs during differentiation (Figures S7A and S7B). Decreased 
DCN expression in LFS OBs was further confirmed in OBs 
from multiple LFS iPSC lines (Figures S7C). DCN expression 
was significantly decreased in OS and p53 mutant cells in com- 
parison with bone/OB tissues and p53 WT cells (Figures S7D and 
S7E). RNAi-mediated knockdown of DCN led to decreased 
expression of the osteogenic factor ZEB1 , ALPL, and several 
osteogenic differentiation-associated genes (Figure S7F), sup- 
porting an essential role for DCN in normal osteogenesis. 
Furthermore, knockdown of HI 9 downregulated DCN while 
ectopic expression of HI 9 upregulated it in WT and LFS 
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Figure 7. DCN Functions Downstream of H19 in LFS-Associated OS Development 

(A) GO biological processes examined by Network2Canvas indicate DCN coregulated genes are significantly upregulated during osteogenesis. 

(B) Panther analysis indicates DCN-coregulated genes are mainly involved in cell adhesion and cell matrix functions. 

(C) H19-mediated upregulation of pre-osteoblastic marker ALPL is abolished upon DCN depletion. Error bars indicate ± SEM of triplicates. 

(D and E) RNAi-mediated DCN knockdown impairs H19-mediated inhibition of LFS OB tumorigenic activity in vitro (D) and in ovo (E). Error bars indicate ± SEM, 
n = 3 in (D). 

(F) GSEA of DCN-correlated gene expression indicates expression of DCN network genes during OB differentiation that is impaired in LFS OBs and OS cells. 

(G) DCN inhibits both OSA and HOS tumorigenesis. The sizes of HOS and OSA tumors were examined 1 and 2 months after subcutaneous injection, respectively. 
Error bars are ± SEM; n = 6. 

See also Figure S7. 



MSCs, respectively (Figure S7G). In agreement with these re- 
sults, DCN expression was directly correlated with H19 ex- 
pression in both WT and LFS samples as well as in OS cell lines 
(Figures S7H and S7I). In contrast, knockdown of DCN did not 
alter H19 expression (Figure S7J). These findings demonstrate 
that H1 9 functions as an upstream regulator of DCN expression. 
To further elucidate whether DCN functions as a downstream 
regulator involved in H19-mediated OB differentiation and tumor 
suppression, we depleted DCN in LFS OBs expressing ectopic 
H19 and found that H19-mediated upregulation of the pre-oste- 
oblastic marker ALPL was abolished upon DCN knockdown 



(Figure 7C) and that the in vitro and in ovo H19-mediated sup- 
pression of LFS OB tumorigenesis was inhibited (Figures 7D 
and 7E). These findings suggest that H1 9 regulation of osteogen- 
esis and suppression of OS is, at least in part, mediated via DCN. 
In comparison with MSCs, GSEA of DCN-coregulated genes re- 
vealed their enrichment during OB differentiation (Figure 7F, left). 
Additionally, in comparison with WT OBs and MSC/OB tissues, 
DCN-coregulated genes were significantly decreased in LFS 
OBs and OS (Figure 7F, right), implying that DCN may negatively 
regulate OS development. Supporting its tumor suppressor 
function, ectopic expression of DCN reduced the incidence of 
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OSA and HOS-initiated tumor development as well as tumor size 
(Figure 7G). Taken together, these findings reveal that dysregu- 
lation of the H19-DCN IGN is strongly linked to LFS-associated 
osteogenic defects culminating in OS development. 

DISCUSSION 

In vitro modeling of human disease has been greatly facilitated 
by iPSC methodologies (Takahashi et al., 2007; Takahashi and 
Yamanaka, 2006; Yu et al., 2007). Characterized by their ability 
to self-renew indefinitely and differentiate into all cell lineages 
of an organism, iPSCs provide a powerful system for human dis- 
ease modeling. The p53 tumor suppressor is considered a prom- 
ising therapeutic target to treat tumors with p53 mutations or 
deletions (Freed-Pastor and Prives, 2012). However, the lack of 
a reliable model limits the development of useful approaches 
to treat cancers caused by either genetic or somatic p53 muta- 
tions. Instead of regular application of clinical patient samples, 
cancer cell lines, and mouse models have been utilized to study 
p53 function. Here, we demonstrate the possibility of using LFS 
iPSCs to turn clinical samples into cell lines and models to study 
the pathological mechanisms caused by mutations in p53. This 
model system not only serves as an alternative tool to study 
p53 mutation-associated disorders but also provides substantial 
benefits for studying the role of p53 in the early stages of tumor 
development. 

The Role of Mutant p53 in Osteogenesis and OS 

A series of studies have demonstrated that the p53 tumor sup- 
pressor promotes differentiation in a variety of cell types (Lee 
et al., 201 2a; Molchadsky et al., 201 0). Since some types of can- 
cer, such as OS, are considered undifferentiated, it is logical to 
regard the cancer to be a defect in protective differentiation 
that would normally suppress unchecked cell proliferation and 
thus, prevent tumor development. However, recent in vivo 
evidence from p53 knockout and conditional MDM2 deletion 
mice, suggesting that wild-type p53 attenuates OB differentia- 
tion and bone development (Lengner et al., 2006; Wang et al., 
2006), makes the situation far more complex. In our current 
studies, we found that HI 9 promotes OB differentiation and 
is repressed by p53; thus, providing a possible explanation 
for how p53 can suppress OB differentiation. Strikingly, the 
p53(G245D) mutant exerted a gain-of-function effect in re- 
pressing HI 9 transcription (Figures S5D and S5F), indicating 
that the defective OB differentiation in LFS OBs may result 
from inhibition of H19-mediated osteogenesis. In contrast to 
the p53(G245D) gain-of-function effect in downregulating HI 9 
expression, this mutant exhibited a partial loss-of-function effect 
in upregulating the majority of known p53 downstream targets 
(e.g., MDM2, p21, and SFN). Advanced systems-level studies 
to characterize p53(G245D) function by identifying genome- 
wide differences between WT and mutant p53 will be needed 
to elucidate the comprehensive mechanisms involved in LFS- 
associated OS development. 

Is HI 9 a Tumor Suppressor or an Oncogene? 

It has been suggested that HI 9 may act as a tumor suppressor 
(Hao et al., 1993; Yoshimizu et al., 2008). In contrast, several 



in vitro culture experiments have suggested a controversial 
oncogenic role for HI 9 (Lustig-Yariv et al., 1997; Verkerk et al., 
1997). This discrepancy could be explained by both differences 
in the experimental systems and the complexity of HI 9 functions 
in developmental and physiological processes. In our studies, 
HI 9 is both significantly upregulated during osteogenesis and 
commonly downregulated in OS, suggesting both differentia- 
tion-promoting and tumor-suppressing roles. Moreover, ectopic 
expression of HI 9 in LFS OBs restored normal osteogenic differ- 
entiation. HI 9 acts not only by directly modulating downstream 
targets as a IncRNA but also, indirectly controls an entire group 
of genes via its associated IGN. Notably, many distinct cancers 
are associated with dysregulation of imprinted genes (Joyce and 
Schofield, 1998). Such dysregulation may disrupt the H19-asso- 
ciated IGN and additional gene networks; thus, offering a 
possible explanation for the distinct effects, variously as an 
oncogene or a tumor suppressor observed after ectopic expres- 
sion of HI 9. Moreover, it must also be acknowledged that the 
H19 locus may play more complex roles than regulation of the 
IGN, and these unidentified functions may also play roles in its 
bivalent function in tumorigenesis both in different tissues and 
at different tumor developmental stages. 

LFS iPSC Disease Model: An Alternative System to Study 
the Early Stages of OS Development 

One key feature of clinically apparent OS is its numerous 
chromosomal alterations and rearrangements (Batanian et al., 
2002; Bridge et al., 1993). A high level of genomic instability, in 
particular, chromosomal instability, is commonly found in OS. 
Tumor suppressor genes are frequently lost and oncogenes 
are duplicated. Because genomic instability is not only a conse- 
quence of tumor progression but also an active driver of tumor 
evolution, it creates a heterogeneous cell population and makes 
it more difficult to understand the initial steps of tumorigenesis. In 
comparison with normal differentiated OBs, human OS cell lines 
and a conditional mouse OS model (Walkley et al., 2008) showed 
strong enrichment of certain cytogenetic rearrangement regions. 
Because human/mouse OS lines are only isolated after many 
steps of tumor evolution, using these to study the initial 
stage of OS tumorigenesis is challenging. In marked contrast, 
LFS OBs and tumors showed a negligible degree of common 
OS cytogenetic rearrangements in comparison with WT OBs 
(Figure 4F), demonstrating the existence of a relatively intact 
genome. This relatively undisturbed genome also helps to 
explain the lower rate of and weak tumorigenicity of LFS OBs 
in vitro and in vivo. Thus we anticipate that LFS iPSC-derived 
OBs will be used in future studies focused on the role of mutant 
p53 in early OS progression prior to development of broad 
genomic alterations. 

In summary, LFS iPSC-derived OBs not only provide a high-fi- 
delity model system to elucidate the pathological mechanism of 
p53 mutant-associated OS development but also document a 
path for using LFS-associated gene expression patterns to pre- 
dict clinical outcomes. More generally, iPSC approaches will 
also facilitate the definition of inherited versus somatically ac- 
quired causal components in many cancers. Further investiga- 
tions to identify the regulatory mechanism of H19-DCN IGN 
and to develop drugs to activate HI 9 and DON may have 
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powerful clinical implications for the treatment and/or prevention 
of OS in patients with either inherited or somatically acquired p53 
mutations. 

EXPERIMENTAL PROCEDURES 

Somatic Cell Programming with Non-Integrating SeV 

The fibroblasts of three LFS patients and two unaffected relatives were 
cultured and maintained in DMEM media supplemented 10% (vol/vol) Bench- 
mark FBS (Gemini Bio-Product) and antibiotics. These fibroblasts were re- 
programmed by transducing SeV expressing the four reprogramming factors 
OCT4, SOX2, KLF4, and c-MYC (CytoTune reprogramming kit, Invitrogen) ac- 
cording to manufacturer protocol. The reprogramming cells were maintained 
in hESC media (DM EM/FI 2 [Cellgro, Mediatech] containing 20% [vol/vol] 
KnockOut Serum Replacement [Invitrogen], L-glutamine, non-essential amino 
acids, p-mercaptoethanol, antibiotics and bFGF). After 3-4 weeks post-induc- 
tion, individual clones with hESC/iPSC morphology and positive for TRA-1-60 
and TRA-1-81 live staining were picked, passaged on MEFs and examined for 
loss of SeV by both staining with anti-SeV-specific antibody (PD029, MBL) and 
by qRT-PCR measurement of expression of the exogenous four factors. The 
specific qPCR primers targeting exogenous OCT4, SOX2, KLF4, and c-MYC 
are listed in Table S4. 

In Vitro Differentiation of iPSCs to MSCs 

In vitro differentiation of WT and LFS iPSCs to MSCs was performed by a 
well-defined MSG differentiation protocol described previously (Lian et al., 
2007). Briefly, IPSCs were seeded in gelatin-coated plates and cultured in 
MSC-differentiation media (DMEM supplemented with 10% Knockout serum 
replacement, 5 ng/ml FGF2 and 5 ng/ml PDGF-AB [PeproTech]) to induce 
differentiation. When differentiated cells were confluent, cells were trypsinized, 
split, and maintained. After 3 weeks of differentiation, the differentiated MSCs 
were sorted as the CD1 05 (eBioscience)-positive and CD24 (BD PharMingen)- 
negative cells by BD Ariall in the Mount Sinai Flow Cytometry Shared Facility 
and expanded in MSC media (DMEM supplemented with 10% FBS). These 
differentiated MSCs were further examined for expression of other MSC sur- 
face markers CD44 (BD PharMingen), CD73 (BD PharMingen), and GDI 66 
(BD PharMingen) as well as MSC-associated factors SNAI and VIM by immuno- 
staining with anti-SNAII (Santa Cruz) and anti-VIM (Millipore) antibodies. 

In Vitro Osteogenic Differentiation of MSCs 

LFS and WT iPSC-derived MSCs were plated in 12-well plate or 6-well plate at 
a density of 1 x 10“^ cells or 2 x 104 cells per well, respectively, in osteogenic 
differentiation medium (a-MEM supplemented with 10% FBS, 0.1 |iM dexa- 
methasone, 10 mM p-glycerol phosphate, and 200 i^M ascorbic acid) (Barberi 
et al., 2005). Cells were differentiated for specific time points as noted in the 
main text before characterization. 

In Vitro AIG and Oncosphere Assays 

LFS and WT MSCs were cultured and passaged in 6-well plate at a density 
of 2 X 10"^ cells per well. Cells were cultured in OB differentiation medium 
for 7 days, split and 1x10“^ cells resuspended in OB differentiation medium 
with 0.4%-0.5% LMP agarose. The cell suspensions were then plated in 
12-well plates containing solidified 0.8% agarose in OB differentiation me- 
dium. Cells were maintained in osteogenic differentiation medium for 1 month 
with medium changes every 3 days. Colony (considered to have a diameter > 
50 i^m) were counted under a microscope. For the oncosphere assay, 2x10"^ 
7-day differentiated osteoblasts were washed by DPBS twice, resuspended 
in oncosphere medium (a-MEM supplemented with 0.1 laM dexamethasone, 
10 mM p-glycerol phosphate, 200 |iM ascorbic acid, B27 supplement, 
5 i^g/ml Heparin, 20 ng/ml bFGF, and 20 ng/ml EGF), and seeded in ultra- 
low attachment 6-well plates (3471, Corning). The number of oncospheres 
(diameter > 50 ^im) was calculated after 12 days. 

Xenotransplantation 

All animal procedures were performed in accordance with the Mount Sinai’s 
Institutional Animal Care and Use Committee (lACUC). 2x10® Matrigel-mixed 



differentiated osteoblasts, 2x10® Matrigel-mixed HOS (ATCC CRL-1543), 
and 1x10® OSA (SJSA-1, ATCC CRL-2098) cells were injected subcutane- 
ously into both right and left hind legs of 8-week-old immunocompromised 
nude mice (Charles River Laboratories). Tumors were excised around 6- 
10 weeks after injection. Tumors were weighed; fixed overnight in 10% neutral 
buffer formalin; embedded in paraffin; sectioned; and stained with H&E, AP, 
picrosirius red, and von Kossa stains to examine bone AP activity, AP expres- 
sion, collagen matrix deposition, and mineralization, respectively, by HistoWiz. 

Statistical Analyses 

Results are expressed as the mean and error bars represent SEM. Difference 
between two groups were examined by two-tailed unpaired t test. *, p<0.05; **, 
p<0.01; and ***, p<0.001. 

ACCESSION NUMBERS 

All mRNA-seq data are listed in Table S2 and deposited in NCBI-Gene Expres- 
sion Omnibus database under accession number GSE58123. 
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SUMMARY 

Outbreaks of fatal leukemia-like cancers of marine 
bivalves throughout the world have led to massive 
population loss. The cause of the disease is un- 
known. We recently identified a retrotransposon, 
Steamer, that is highly expressed and amplified to 
high copy number in neoplastic cells of soft-shell 
clams (Mya arenaria). Through analysis of Steamer 
integration sites, mitochondrial DNA single-nucleo- 
tide polymorphisms (SNPs), and polymorphic micro- 
satellite alleles, we show that the genotypes of 
neoplastic cells do not match those of the host ani- 
mal. Instead, neoplastic cells from dispersed loca- 
tions in New York, Maine, and Prince Edward Island 
(PEI), Canada, all have nearly identical genotypes 
that differ from those of the host. These results indi- 
cate that the cancer is spreading between animals in 
the marine environment as a clonal transmissible cell 
derived from a single original clam. Our findings 
suggest that horizontal transmission of cancer cells 
is more widespread in nature than previously 
supposed. 

INTRODUCTION 

Cancers most often arise as a result of mutations accumulated in 
somatic cells during the lifetime of an organism and constitute a 
clonal expansion of a transformed cell with a genotype derived 
from the host. Tumors are generally not contagious or trans- 
mitted to other individuals, being subject to immune recognition 
and rejection based on polymorphic surface proteins, notably 
the major histocompatibility complex (MHC) in vertebrates. 
Some tumors are induced by infectious agents such as viruses, 
and though these agents can be contagious, each tumor still 
arises in the infected individual by transformation of somatic 
cells. Two cases are known in which a tumor cell itself naturally 
spreads among individuals as a contagious cell line. These are 
the canine transmissible venereal tumor (CTVT) (Murgia et al., 
2006), transmitted by sexual contact, and the Tasmanian devil 
facial tumor disease (DFTD) (Pearse and Swift, 2006), trans- 
mitted between individuals by bites. In these two cases, the 
tumors exhibit a genotype that does not match that of their 
host. Instead, the tumor cells found in all of the affected animals 



are a single clone with a unique genotype that reflects that of its 
original, primordial host. 

Disseminated, or hemic, neoplasia is a leukemia-like cancer 
occurring in many marine bivalves, including clams, mussels, 
cockles, and oysters, and is characterized by amplification of 
cells in the hemolymph, the circulatory fluid of molluscs (Barber, 
2004). The disease affects soft-shell clams (Mya arenaria) along 
the east coast of North America and was first documented in the 
species in the 1970s (Brown et al., 1978; Muttray et al., 2012; 
Yevich and Barszcz, 1978). The neoplastic hemocytes are char- 
acterized by distinctive morphology, new surface antigens, cyto- 
plasmic mislocalization of the p53 tumor suppressor protein, 
loss of phagocytic abilities, greatly increased number (Figure 1), 
and dissemination of the cells into tissues (Barber, 2004; Miosky 
et al., 1989; Smolowitz et al., 1989; Walker et al., 2006, 2009). 
Most bivalve disseminated neoplasias are aneuploid, with higher 
than normal DNA content, and the neoplastic cells of M. arenaria 
are roughly tetraploid based on flow cytometric analysis of DNA 
content, although there is variation between individuals (Reno 
et al., 1994). The disease can be transmitted to naive animals 
by experimental transplantation of hemocytes (McLaughlin 
et al., 1992; Weinberg et al., 1997). A viral cause has been sus- 
pected, but no infectious agent has been confirmed (Taraska 
and Anne Bottger, 2013; Walker et al., 2009). The disease is ulti- 
mately fatal in the majority of affected clams and is contributing 
to depletion of the species in many areas along the east coast of 
North America (Barber, 2004; Cooper et al., 1982). 

The detection of reverse transcriptase activity in neoplastic 
cells (House et al., 1998) suggested the possibility of retroele- 
ment involvement, and we used high-throughput sequencing 
of clam cDNA to identify a previously unknown LTR-retrotrans- 
poson in soft-shell clams. Steamer, whose expression was 
found to be strongly associated with disease (Arriagada et al., 
2014). In normal clams, the genome contains 2-10 endogenous 
copies of the retrotransposon Steamer (normalized to a single 
copy gene), but in neoplastic hemocytes, the Steamer DNA 
copy number is massively amplified to 150-300 copies. The 
finding of several common Steamer integration sites in the 
neoplastic cells of multiple leukemic animals prompted further 
investigation of the genetics of this cancer. 

Here, we first extend the observation that neoplastic cells 
contain common Steamer integration sites that are not 
present in normal animals or in normal tissues of diseased ani- 
mals. These results have two possible explanations: either 
Steamer retrotransposons exhibit unprecedented selectivity for 
specific integration sites in these multiple neoplasms, or these 
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Figure 1. Collection of Normal and Leukemic Soft-Shell Clams 

(A) Photograph of representative soft-shell clams {M. arenaria) with siphon partially extended. 

(B) Hemolymph from a normal clam (NYTC-C6), showing attachment of hemocytes to the dish and extension of pseudopodia. Scale bar, 50 |am. 

(C) Hemolymph from a heavily leukemic clam (NYTC-C9) showing lack of attachment and rounded, retractile morphology. Hemolymph of the leukemic clam was 
diluted 1:100 to allow visualization of single cells. Scale bar, 50 i^m. 

(D) Map of the eastern coast of North America with the locations of the clam collection sites (made with Mapbox Studio using data from OpenStreetMaps). 



neoplasms did not arise independently but are descendants of a 
primordial leukemic cell carrying these common Steamer inte- 
grations, similar to the contagious cancers observed in dogs 
and Tasmanian devils. We therefore analyzed mitochondrial 
DNA (mtDNA) sequences and polymorphic microsatellite repeat 
loci and found that the genotypes of the neoplastic cells do not 
match those of their hosts. Furthermore, all neoplastic geno- 
types are nearly identical to each other, strongly arguing that 
soft-shell clam leukemia across the Atlantic coast is horizontally 
transmitted as a contagious cancer cell derived from a single 
clonal origin. 

RESULTS 

Common Steamer Integration Sites in Neoplastic Cells 

Neoplastic cells of leukemic animals have a high copy number of 
the Steamer retrotransposon, but this high copy number is not 
present in normal cells. Analysis of DNA from different tissues 
of diseased animals showed that Steamer copy number was 
low in solid tissues and highest in hemocytes though with in- 
creases in Steamer copy number in some solid tissues, probably 
due to dissemination of neoplastic hemocytes (Figure 2). Prelim- 
inary sequence analysis of a small number of Steamer integration 
sites showed that neoplastic DNA contains new integration sites 
not present in normal clam DNA, and surprisingly, these new 
copies were often found at identical locations in different 
leukemic animals (Arriagada et al., 2014). We now extended 
this analysis to leukemic animals from independent populations 



in New York, Maine, and Prince Edward Island (PEI), Canada. 
Steamer integration sites were cloned from multiple leukemic an- 
imals by inverse PCR, and diagnostic primers were designed to 
amplify each specific integration site. PCR tests using these 
primers then allowed us to score for the presence of each inser- 
tion in neoplastic and tissue DNA (Figure 3A). Of 12 integration 
sites tested, 7 were present in neoplastic DNA samples collected 
from all geographic locations, 4 more were common to the Maine 
and New York neoplasms, but not those from Canada, and 1 site 
was unique to neoplasms from Canada (Figure 3B). These can- 
cer-specific integration sites were not found in any healthy ani- 
mals and were only weakly detected in tissue DNA from leukemic 
animals, likely due to infiltration of the neoplastic cells into 
normal tissue. 

Common mtDNA SNPs in Neoplastic Cells 

The observation of common integration sites raises the possibil- 
ity of a clonal origin for all of the tumors. To test the hypothesis 
that soft-shell clam leukemia is horizontally transmitted between 
animals as a clonal transmissible cancer cell, we examined DNA 
from leukemic and normal clams for single-nucleotide polymor- 
phisms (SNPs) in the sequences of mitochondrial genes encod- 
ing cytochrome c oxidase subunit I (COI) (Folmer et al., 1 994) and 
cytochrome b (CYTB) (Table 1 , Figure SI). DNA of all healthy an- 
imals (n = 1 1) and the tissue DNA of all of the leukemic animals 
contained a common allele (C685) in the CYTB gene, while the 
neoplastic hemocyte DNA of all leukemic animals (n = 9) carried 
a distinctive SNP (C685T). This observation supports the 
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Figure 2. Steamer Retrotransposon DNA Copy Numbers in Different 
Clam Tissues 

Quantitative PCR was used to determine the copy number of the Steamer 
retrotransposon in genomic DNA from hemocytes and soiid tissues of four 
representative normai (ieft) and four ieukemic (right) ciams. Steamer copy 
number was quantitated using primers that ampiify a region in the Steamer 
reverse transcriptase {RJ) and was normaiized to the singie-copy gene EF1 
(Siah et al., 2011). *p < 0.01 for comparisons between neopiastic hemocytes 
and each other tissue tested (n = 3 for giiis, n = 4 for other tissues), using two- 
taiied paired t test of normaiized Steamer copy number. 



and not in any of the 27 previously identified M. arenaria COI 
alleles (Strasser and Barber, 2009), suggesting that this mutation 
occurred during the evolution of the transmissible cells in this 
population. Additionally, seven low-frequency SNPs were identi- 
fied in the mitochondrial DNA of normal animals and in the tissue 
of leukemic animals. In the four informative positions where rare 
SNPs were detected in tissue DNA of leukemic animals, the 
neoplastic hemocyte DNA did not contain that matching tissue 
allele but, rather, contained an allele common to all neoplastic 
cells (Table 1). These data indicate that the leukemias did not 
arise from their hosts and are consistent with clonal transmission 
of the cancer. 

Microsatellite Genotypes Confirm Clonal Origin of Clam 
Leukemia 

To characterize the genotype of the neoplastic cells in greater 
detail, we analyzed ten microsatellite loci in nuclear DNA, previ- 
ously identified as polymorphic in M. arenaria (Krapal et al., 201 2; 
St-Onge et al., 201 3), by PCR. The sizes of these loci were highly 
polymorphic among individuals, and in many cases the animals 
were heterozygous at the loci, with PCR products of two sizes 
(Figure 4). In normal animals, the alleles present in hemocyte 
DNA always matched the alleles in the host tissue DNA. In 
contrast, in all leukemic animals, the microsatellite genotypes 
in the neoplastic hemocytes were distinct from those in the tis- 
sue of the host animal. Furthermore, at nearly every locus, the 
microsatellite alleles were of the same lengths in the neoplastic 
hemocytes of all leukemic animals. That is, the genotypes of all 
of the tumors were nearly identical to each other. 

Next, fluorescently labeled primers were used to distinguish 
microsatellite PCR product sizes with single-base resolution. 
While all normal animals have one allele (homozygous) or two 
alleles (heterozygous) for each locus, microsatellite size analysis 
of the leukemic DNA revealed up to four separate alleles at some 
of the loci. Many of the leukemic alleles were identical between 
independent neoplasms, but there were some small variations 
in repeat size and several cases of allele loss or gain in some 
leukemic individuals. The model-based clustering program 
STRUCTURE (Pritchard et al., 2000) was used to analyze the mi- 
crosatellite allele sizes, and regardless of the number of clusters 
(K) used, the neoplastic hemocyte genotypes clearly grouped 
separately from the normal animal genotypes, while the tissue 
of the leukemic animals clustered with normal animals (Fig- 
ure 5A). A neighbor-joining tree based on the genetic distances 
between individuals (Bruvo et al., 2004; Kamvar et al., 2014) 
clearly demonstrated that the neoplastic hemocyte genotypes 
cluster as a separate group, distinct from all of the normal geno- 
types and from the genotypes of the host tissue. Two branches 
were apparent within the neoplastic lineage, suggesting that the 
extant leukemias arose from a common ancestor that diverged 
into the PEI and USA cancer subgroups (Figure 5B). 

DISCUSSION 



hypothesis that the neoplastic cells are transferred between an- 
imals as an allograft that contains this unique SNP. Another SNP 
(G649A in COI) was identified in the hemocyte DNA of all four 
leukemic animals from PEI, but not in those from other locations 



Three data sets (Steamer integration sites, mtDNA SNPs, and 
microsatellite alleles) show that the genotypes of the neoplastic 
cells all differ from the genotype of their host animals and are 
identical or very closely related to each other. We conclude 
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Figure 3. Clonal Steamer Integration Sites in Neoplastic Cells 

(A) To determine the presence of specific integration sites in different animals, 
inverse PCR products were cloned and sequenced from clams from three 
locations (two integration sites per animal). For each integration site, a reverse 
primer was designed to match the flanking genomic sequence. Diagnostic 
PCR using a common forward primer in the Steamer LTR and an integration 
site-specific reverse primer was used to determine the presence of each 
specific integration site in hemocyte (H) and tissue (T) DNA as indicated, of 
normal (N) and leukemic (L) clams. Sizes of the amplified DMAs were analyzed 
by agarose gel electrophoresis. Amplification of EF1 is shown as a control. 
Filled triangles mark the position of migration of the amplicon. An open triangle 
marks an unexpected PCR product. This band is due to amplification of a 
second Steamer integration site present in neoplastic cells of all populations. 

(B) Venn diagram of the number of cancer-specific integration sites shared by 
neoplastic cells from the three locations (out of the 12 sites tested, including 4 
sites cloned from PEI samples previously [Arriagada et al., 2014]). None of the 
12 sites were present in any normal animal tested. 



that the individual leukemias are not derived by independent 
oncogenic transformation of cells within each host but instead 
come from a single genetically unrelated parent. The data 
strongly argue that disseminated neoplasia is naturally 
spreading between animals as a transmissible cancer cell. Hor- 
izontal clonal transmission of cancer has only been observed in 
two other contagious cancers of mammals in the wild, the DFTD 
in Tasmanian devils (Pearse and Swift, 2006) and CTVT in dogs 
(Murgia et al., 2006). 

It is remarkable that neoplastic cells from clam beds sepa- 
rated by hundreds of miles were found to be genetically nearly 
identical or very closely related. The mechanism by which the 
soft-shell leukemic cell line could be transferred from animal 



to animal in the wild is not clear. In the two previously known 
cases of natural cancer cell transfer (DFTD and CTVT) (Murch- 
ison, 2008), physical contact is required for transmission of 
cancer cells to naive individuals (through biting or sexual 
contact, respectively), but adult clams are sessile and do not 
normally come in contact with one another. Clams do filter 
feed, however, raising the possibility that leukemia engraftment 
occurs through filtration of seawater contaminated with neo- 
plastic cells. Each animal can filter several liters of seawater 
per hour, so very low concentrations of cells in seawater could 
be sufficient for transmission. One early study showed that 
hemocytes from a leukemic clam could survive in natural 
seawater conditions for >6 hr with minimal cell death, suggest- 
ing a plausible route for cancer cell transmission from one clam 
to another (Sunila and Farley, 1989). Direct surveys of seawater 
at the affected clam beds for tumor cells could provide support 
for this idea. It is currently unknown how neoplastic cells would 
be released by leukemic animals. Cells may be shed naturally 
during disease, expelled during spawning, or released after 
physical trauma, during predation, or at death of the leukemic 
individual. The stage of development when transmission might 
occur in nature is unknown, though experimental transmission 
of disease was found to be most efficient with mature animals 
of intermediate size (Taraska and Anne Bottger, 2013). It is 
possible that its transmission has been facilitated or acceler- 
ated by human intervention, as seed stocks have been trans- 
planted between sites along the coast at several times in recent 
decades (Beal and Kraus, 2002). 

The length of time since the original formation of the primor- 
dial soft-shell neoplasia is not yet clear. The disease has been 
documented in soft-shell clams since the 1970s (Brown et al., 
1978; Yevich and Barszcz, 1978), and it is likely that all cases 
derive from the same clone, suggesting that it may be at least 
40 years old and possibly much older. Its appearance must 
have been sufficiently far in the past to allow for its spread 
widely along the North Atlantic coast. Significant divergence 
was observed in cancer cell subgroups specific to the USA 
and Canada clam populations. Several small microsatellite 
expansions and deletions and at least one mtDNA SNP appear 
to have developed separately in the PEI subgroup of cancer 
cells. Further analysis of the origin and continued evolution of 
these cancer cell lineages may allow for estimation of the 
time of appearance of the original leukemia. The two other ex- 
amples of transmissible tumors are of very different ages: the 
dog tumor is thought to have arisen 10,000-13,000 years ago 
(Murchison et al., 2014), while the Tasmanian devil tumor is 
of much more recent origin, perhaps arising only 20-30 years 
ago (Murchison, 2008). 

It is plausible that the Steamer retrotransposon had a role 
in the development of the transmissible disseminated neoplasia. 
Steamer genomic copy number is greatly expanded in 
neoplastic cells (from 2-10 copies to 150-300 per haploid 
genome), and we show that the majority of the new integrations 
are common to all tested neoplastic cells and therefore likely 
occurred early in the evolution of the cancer. One or more of 
these common integration events may have directly caused 
initial oncogenic mutations, similar to the LINE1 integration 
near the c-myc gene in all known cases of CTVT (Katzir et al.. 
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Nucleotide numbers based on the complete COI and CYTB genes in M. arenaria mtDNA genome (KJ755996). Boxes indicate discordance between 
hemocyte and tissue DNA. Three additional CYTB SNPs were identified: T442C (L148) in MELC-C4 and T658C (F220L) and T696C (I232) in MELC-A4. 
T, tissue; H, hemocyte; not done due to lack of availability of tissue samples. 

^Hemocyte allele can also be observed in tissue DNA; see sequencing traces in Figures S1 A and S1 B. 



1985; Murgia et al., 2006), but alternatively they may represent 
passenger mutations that were fixed in the genome of the leuke- 
mia. Some of the integration sites are unique to the PEI or USA 
subgroups of neoplastic cells, and these may represent active 
gain of new copies by retrotransposition or loss of copies due 
to recombination or aneuploidy; these sites are unlikely to be a 
cause of the tumor. Further analysis of the genome or the 
expression profile of the neoplastic cells in comparison with 
those of normal hemocytes could provide clues to the cause of 
the transformed phenotype. 

While all of the M. arenaria neoplastic cells that we have 
tested were derived from the same cell lineage, other leukemias 
may be derived from different clones or by more conventional 
mechanisms. Leukemia can be induced in clams by 5-bromo- 
deoxyuridine (BrdU) injection (Oprandy and Chang, 1983; Tar- 
aska and Anne Bottger, 2013), and these cases must arise by 
de novo transformation of host cells and perhaps by induction 
of Steamer transposition. It has also been reported that clam 
leukemia could be transmitted between animals by a filterable 
agent (Oprandy et al., 1981; Taraska and Anne Bottger, 2013), 
but this is controversial (AboElkhair et al., 2012; McLaughlin 
et al., 1992). 

Transmissible disseminated neoplasias have been reported in 
other bivalve species, including other clams, oysters, mussels. 



and cockles (Barber, 2004), and these diseases may also be 
caused by a clonal transmissible cancer cell. A recent report 
(Vassilenko et al., 2010) that neoplastic hemocytes in mussels 
(Mytilus trossulus) on the west coast of North America share a 
common set of synonymous SNPs in the p53 gene is consistent 
with an independent development of a transmissible clonal can- 
cer in that species. 

Flow cytometric analyses of DNA content of disseminated 
neoplasias in many bivalve species have identified character- 
istic ploidy levels— indeed, polyploidy has been used as diag- 
nostic for disease (Delaporte et al., 2008). In M. arenaria specif- 
ically, the observation of roughly tetraploid DNA content in 
neoplastic cells (Delaporte et al., 2008; Reno et al., 1994) is 
consistent with a clonal contagious cancer and with the obser- 
vation of up to four unique microsatellite alleles at a single locus 
in neoplastic cells. Given the current findings, previous obser- 
vations of abnormal ploidy in bivalve disseminated neoplasia 
suggest that many of these diseases in other species may 
represent independent contagious cancer lineages. For 
example, mussel-disseminated neoplasia has been reported 
to come in two types, tetraploid or pentaploid (Moore et al., 
1991), consistent with either two independent cancer lineages 
or one that has evolved into subgroups with divergent ploidy, 
although a more recent study found a wider distribution of 
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aneuploidy between individuals (Vassilenko and Baldwin, 
2014). Multiple abnormal ploidy levels have also been observed 
in disseminated neoplasia in cockles (Cerastoderma edule) (da 
Silva et al., 2005). We are currently investigating the hypothesis 
suggested by the current findings that disseminated neoplasia 
in other bivalve species represent independent lineages of con- 
tagious cancer. Preliminary mtDNA sequencing of cockles with 
disseminated neoplasia suggests a cockle-derived contagious 
cancer line (data not shown). 

It is possible that the disseminated neoplasia of M. arenaria 
could transmit to other species, though there is as of yet no 
evidence for this. There may be mechanisms by which a host 
animal can reject colonization by neoplastic cells of another 
species. Molluscs are not known to have a self/nonself recog- 
nition system similar to the MHC system of vertebrates, and 
their lack of MHC may make them highly susceptible to this 
form of infectious malignancy. Indeed, in the cases of CTVT 
and DFTD, there are unique mechanisms by which the cancers 
avoid MHC recognition. In CTVT, expression of MHC I and II is 
downregulated during active growth of the tumor, and their 
expression leads to immune recognition and clearance of the 
tumor in most dogs (Yang et al., 1987). DFTD cells also down- 



Figure 4. Amplification of Microsatellite 
Loci from Tissue and Hemocyte DNA from 
Normal and Leukemic Clams 

PCR products using primers flanking ten micro- 
satellite loci in hemocyte DNA and tissue DNA 
were displayed by electrophoresis on 2.5% 
agarose gels and visualized by staining. Different 
alleles are determined by the sizes of the ampli- 
cons, with one band observed for animals homo- 
zygous at a particular locus and two or more for 
heterozygotes and polyploid neoplastic cells. 
Each row of gels represents amplification from a 
single microsatellite locus, as labeled on the left. 
Hemocyte DNA is shown for normal and leukemic 
clams from PEI, Canada, and both tissue (T) and 
hemocyte (H) DNA is shown for Maine and New 
York clams. These data show that the leukemic 
hemocyte microsatellite alleles are identical to 
each other and distinct from their host tissue. 



regulate MHC expression, and the low 
genetic diversity of devils limits the ability 
of hosts to recognize DFTD as foreign 
(Siddle and Kaufman, 2015; Siddle et al., 
2013). Contagious cancer is a serious 
threat to marine invertebrates, leading to 
severe mortalities during outbreaks in 
soft-shell clams and possibly leading to 
mass mortalities in many other species. 
The disease represents a significant se- 
lective pressure and supports the hypoth- 
esis that histocompatibility could have 
evolved, in part, due to selective pressure 
to prevent malignancy (Murgia et al., 
2006) rather than simply being a second- 
ary consequence of pressure by infec- 
tious diseases. Despite the lack of MHC, molluscs and 
other invertebrates may employ other self/nonself recognition 
mechanisms that can combat this type of disease, perhaps 
similar to the fusion/histocompatibility (Fu/HC) system of 
colonial ascidians, which protects ascidians from stem cell 
parasitism, which can occur when unrelated individuals fuse 
(De Tomaso et al., 2005; Voskoboynik et al., 2013). Bivalves 
may utilize a histocompatibility system that may be unique to 
molluscs or evolutionarily related to Fu/HC or MHC. 

Natural horizontal transmission of cancer between individuals 
has been considered a rare phenomenon, restricted to two 
exceptional cases in mammals. Our finding of the horizontal 
transmission of a clonal clam leukemia extends the phenomenon 
to the marine environment and demonstrates that this mecha- 
nism is more widespread in nature than previously supposed. 

EXPERIMENTAL PROCEDURES 

Sample Collection 

Soft-shell clams {M. arenaria) were collected from several locations (Figure 1). 
Clams from the Dunk estuary on Prince Edward Island (PEI), Canada were 
collected and diagnosed as described previously (Arriagada et al., 2014). 
Clams from Larrabee Cove, Maine (MELC) were collected by Brian Beal 
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(University of Maine at Machias) as described previousiy, and one additionai 
ieukemic ciam (ME-HL01) was coiiected by a commerciai source near St. 
George, Maine with the heip of Charies Waiker (University of New Hampshire). 
Ciams from New York (NYTC) were acquired from a commerciai market and 
coiiected from the north shore of Long isiand, New York near Port Jefferson, 
individuai ciam iDs reflect the location and the ID number assigned at the 
time of collection. 

Diagnosis of clams from New York and Maine was conducted by extracting 
hemolymph from the pericardial region using a 26 gauge needle. Six to eight 
drops of hemolymph were placed in a well of a 96-well plate and left undisturbed 
at 1 0°C for 1 hr before morphological analysis using phase-contrast microscopy. 
Normal clam hemocytes attached to the dish (Figure 1 A), and clams with >50% 
rounded, retractile cells were considered leukemic (Figure 1 B). 

PCR and qPCR 

Primers and conditions used for qPCR, diagnostic PCR for specific integration 
sites, COI (Folmer et al., 1994) and CYTB PCR, and PCR of microsatellite loci 
are provided in Extended Experimental Procedures and Table SI . 



Figure 5. Analysis of Microsatellite Alleles 

(A) CLUMPAK display of STRUCTURE (Pritchard 
et al., 2000) analysis of ten microsatellite loci 
showing the major population clustering with 
varying number of clusters (K) from K = 2-4. Each 
vertical bar represents a normal clam genotype 
or either tissue or hemocyte genotypes from 
leukemic clams with the colors representing 
cluster identity. 

(B) Neighbor-joining tree constructed with genetic 
distances based on ten microsatellite loci using 
Bruvo’s method for analysis of loci from in- 
dividuals with variable ploidy (Bruvo et al., 2004) 
calculated using the poppr package with R 
(Kamvar et al., 2014). Bootstrap values above 25 
are shown at nodes. The scale bar represents a 
genetic distance of 0.05 (where 0 represents 
completely identical genotypes and 1 represents 
no common alleles). Each sample is marked 
as tissue (T, closed circle) or hemocytes (H, open 
circle) of normal (black) or leukemic (red) clams. 
The leukemic genotypes all cluster together in two 
branches clearly apart from the normal geno- 
types. Allele sizes are listed in Table S2, and 
further information is available in the Extended 
Experimental Procedures. 



Microsatellite Analysis 

STRUCTURE (Pritchard et al., 2000) was used to 
determine the population clustering of the clam tis- 
sue and hemocyte DNA. As up to four unique 
alleles were detected in leukemic DNA, loci were 
considered to be tetraploid, and clams with lower 
ploidy were considered to have missing data for 
the additional alleles, which are ignored in analysis. 
Cluster values of K = 1 to 10 were used with an 
admixture model with a burnin of 50,000 and 
50,000 replicates after burnin. CLUMPAK (Kopel- 
man et al., 2015) was used to graphically analyze 
the major clusters across ten STRUCTURE runs. 
Data were also recoded and reanalyzed as binary 
absent/present alleles (Rodzen and May, 2002), 
with nearly identical results (data not shown). 

Genetic distance was calculated using Bruvo’s 
band-sharing method assuming infinite alleles 
(Bruvo et al., 2004), as it was created specifically 
for comparisons of individuals with different ploidy levels, and a neighbor- 
joining tree was constructed (with 100 bootstraps) using the poppr package 
for R (Kamvar et al., 2014) and displayed using Figtree. Alternate analyses us- 
ing the genome addition and combined genome addition/genome loss models 
generated nearly identical results (data not shown). Further details of the mi- 
crosatellite analysis can be found in the Extended Experimental Procedures. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, 
one figure, and two tables and can be found with this article online at http:// 
dx.doi.org/1 0. 1 01 6/j.cell.201 5.02.042. 
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SUMMARY 

The gastrointestinal (Gl) tract contains much of 
the body’s serotonin (5-hydroxytryptamine, 5-HT), 
but mechanisms controlling the metabolism of gut- 
derived 5-HT remain unclear. Here, we demonstrate 
that the microbiota plays a critical role in regulating 
host 5-HT. Indigenous spore-forming bacteria (Sp) 
from the mouse and human microbiota promote 5- 
HT biosynthesis from colonic enterochromaffin cells 
(ECs), which supply 5-HT to the mucosa, lumen, and 
circulating platelets. Importantly, microbiota-depen- 
dent effects on gut 5-HT significantly impact host 
physiology, modulating Gl motility and platelet func- 
tion. We identify select fecal metabolites that are 
increased by Sp and that elevate 5-HT in chromaffin 
cell cultures, suggesting direct metabolic signaling 
of gut microbes to ECs. Furthermore, elevating 
luminal concentrations of particular microbial metab- 
olites increases colonic and blood 5-HT in germ-free 
mice. Altogether, these findings demonstrate that Sp 
are important modulators of host 5-HT and further 
highlight a key role for host-microbiota interactions 
in regulating fundamental 5-HT-related biological 
processes. 



INTRODUCTION 

In addition to its role as a brain neurotransmitter, the monoamine 
serotonin (5-hydroxytryptamine [5-HT]) is an important regulato- 
ry factor in the gastrointestinal (Gl) tract and other organ sys- 
tems. More than 90% of the body’s 5-HT is synthesized in the 
gut, where 5-HT activates as many as 14 different 5-HT receptor 
subtypes (Gershon and Tack, 2007) located on enterocytes 
(Hoffman et al., 2012), enteric neurons (Mawe and Hoffman, 
201 3), and immune cells (Baganz and Blakely, 201 3). In addition, 
circulating platelets sequester 5-HT from the Gl tract, releasing it 
to promote hemostasis and distributing it to various body sites 
(Amireault et al., 2013). As such, gut-derived 5-HT regulates 
diverse functions, including enteric motor and secretory reflexes 
(Gershon and Tack, 2007), platelet aggregation (Mercado et al., 
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201 3), immune responses (Baganz and Blakely, 201 3), and bone 
development (Chabbi-Achengli et al., 2012; Yadav et al., 2008), 
and cardiac function (Cote et al., 2003). Furthermore, dysregula- 
tion of peripheral 5-HT is implicated in the pathogenesis of 
several diseases, including irritable bowel syndrome (IBS) (Stasi 
et al., 201 4), cardiovascular disease (Ramage and Villalon, 2008), 
and osteoporosis (Ducy and Karsenty, 2010). 

The molecular mechanisms controlling the metabolism of 
gut 5-HT remain unclear. In the Gl tract, 5-HT is synthesized 
by specialized endocrine cells, called enterochromaffin cells 
(ECs), as well as mucosal mast cells and myenteric neurons 
(Gershon and Tack, 2007), but the functions of these different 
pools of gut 5-HT are incompletely understood. In addition, 
two different isoenzymes of tryptophan hydroxylase (Tph), 
Tphi and Tph2, mediate non-neuronal versus neuronal 5-HT 
biosynthesis (Walther et al., 2003), but little is known regarding 
the endogenous signals that regulate Tph expression and 
activity. 

Mammals are colonized by a vast and diverse collection of 
microbes that critically influences health and disease. Recent 
studies highlight a role for the microbiota in regulating blood 
5-HT levels, wherein serum concentrations of 5-HT are substan- 
tially reduced in mice reared in the absence of microbial coloni- 
zation (germ-free [GF]), compared to conventionally-colonized 
(specific pathogen-free [SPF]) controls (Sjogren et al., 2012; Wik- 
off et al., 2009). In addition, intestinal ECs are morphologically 
larger in GF versus SPF rats (Uribe et al., 1994), which suggests 
that microbes could impact the development and/or function of 
5-HT-producing cells. Interestingly, some species of bacteria 
grown in culture can produce 5-HT (Tsavkelova et al., 2006), 
raising the question of whether indigenous members of the mi- 
crobiota contribute to host 5-HT levels through de novo synthe- 
sis. Based on this emerging link between the microbiota and 
serum 5-HT concentrations, we aimed to determine how path- 
ways of 5-HT metabolism are affected by the gut microbiota, 
to identify specific microbial communities and factors involved 
in conferring serotonergic effects, and to evaluate how microbial 
modulation of peripheral 5-HT impacts host physiology. 

Here, we show that the microbiota promotes 5-HT biosyn- 
thesis from colonic ECs in a postnatally inducible and reversible 
manner. Spore-forming microbes (Sp) from the healthy mouse 
and human microbiota sufficiently mediate microbial effects on 
serum, colon, and fecal 5-HT levels. We further explore potential 
host-microbial interactions that regulate peripheral 5-HT by 
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surveying microbial influences on the fecal metabolome. We find 
that particular microbial metabolites are elevated by Sp and 
likely signal directly to colonic ECs to promote 5-HT biosyn- 
thesis. Importantly, microbiota-mediated changes in colonic 5- 
HT regulate Gl motility and hemostasis in the host, suggesting 
that targeting the microbiota can serve as a tractable approach 
for modulating peripheral 5-HT bioavailability and treating 5- 
HT-related disease symptoms. 

RESULTS 

The Gut Microbiota Modulates Host Peripheral 
Serotonin Levels 

Adult GF mice display deficient serum (Sjogren et al., 2012; Wik- 
off et al., 2009) (Figure 1A) and plasma (Figure SI A) 5-HT con- 
centrations compared to SPF controls, but the cellular sources 
of this disruption are undefined. Consistent with the understand- 
ing that much of the body’s 5-HT derives from the Gl tract, we 
find that GF mice exhibit significantly decreased levels of colonic 
and fecal 5-HT compared to SPF controls (Figures 1 B and SI A; 
Table SI). This deficit in 5-HT is observed broadly across 
the distal, medial and proximal colon (Figure SID), but not in 
the small intestine (Figures SI A, S2A, and S2B), suggesting a 
specific role for the microbiota in regulating colonic 5-HT. 
Decreased levels of 5-HT are localized to colonic chromogranin 
A-positive (CgA+) enterochromaffin cells (ECs) (Figure 2), and 
not to small intestinal ECs (Figures S2A and S2B). Low 5-HT 
signal is seen in both GF and SPF colonic mast cells and enteric 



Figure 1. The Gut Microbiota Modulates 
Host Peripheral Serotonin Levels 

(A) Levels of serum 5-HT. Data are normalized to 
serum 5-HT in SPF mice (n = 8-13). 

(B) Levels of colon 5-HT relative to total protein. 
Data are normalized to colon 5-HT relative to total 
protein in SPF mice (n = 8-13). 

(C) Colonic expression of TPH1 relative to GAPDH. 
Data are normalized to expression levels in SPF 
mice (n = 4). 

(D) Colonic expression of SLC6A4 relative to 
GAPDH. Data are normalized to expression levels in 
SPF mice (n = 4). 

Data are presented as mean ± SEM. *p < 0.05, 
**p < 0.01, ***p < 0.001. n.s., not statistically sig- 
nificant; SPF, specific pathogen-free (convention- 
ally-colonized); GF, germ-free; CONV., SPF con- 
ventionalized; ABX, antibiotic-treated; VEH, vehicle 
(water) -treated. 

See also Figure SI . 



neurons (Figure 2A), which are minor pro- 
ducers of 5-HT (Gershon and Tack, 
2007). There is no difference between 
adult GF and SPF mice in the abundance 
of CgA-i- ECs (Figure 2C), suggesting that 
decreases in colon 5-HT result from 
abnormal 5-HT metabolism rather than 
impaired development of ECs. 

To identify the specific steps of 5-HT 
metabolism that are affected by the micro- 
biota, key intermediates of the 5-HT pathway were assessed in 
colons from GF versus SPF mice. We find that GF colons exhibit 
decreased expression of TPH1 (Figures 1C and SID) (Sjogren 
et al., 2012), the rate-limiting enzyme for 5-HT biosynthesis in 
ECs, but no difference in expression of enzymes involved in 5- 
HT packaging, release and catabolism (Figure SIC). GF mice 
also display elevated colonic expression of the 5-HT transporter 
SLC6A4 (Figures 1 D and SI E) (Sjogren et al., 2012), synthesized 
broadly by enterocytes to enable 5-HT uptake (Wade et al., 
1996). This could reflect a compensatory response to deficient 
5-HT synthesis by host ECs, based on the finding that chemical 
Tph inhibition modulates SLC6A4 expression (Figures S2C and 
S2D). There is no difference between GF and SPF mice in colonic 
expression of neural-specific isoforms of 5-HT enzymes (Fig- 
ure S1F), consistent with data showing no apparent difference 
in 5-HT-specific staining in enteric neurons (Figure 2). Despite 
deficient levels of colon, fecal, and serum 5-HT (Figures 1A, 
IB, and SI A; Table SI), GF mice exhibit significantly increased 
levels of the Tph substrate, tryptophan (Trp), in both feces (Table 
SI) and serum (Sjogren et al., 2012; Wikoff et al., 2009), suggest- 
ing that primary disruptions in host TPH1 expression result in Trp 
accumulation. Oral supplementation of GF mice with the Tph 
product, 5-hydroxytryptophan (5-HTP), sufficiently ameliorates 
deficits in colon and serum 5-HT, whereas supplementation 
with the Tph substrate Trp has no restorative effect (Figures 
SI G-S1 1). Collectively, these data support the notion that the mi- 
crobiota promotes 5-HT biosynthesis by elevating TPH1 expres- 
sion in colonic ECs. 
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Figure 2. Indigenous Spore-Forming Bacteria Increase 5-HT Levels in Colon Enterochromaffin Cells 

(A) Representative images of coions stained for chromogranin A (CgA) (ieft), 5-HT (center), and merged (right). Arrows indicate CgA-positive ceiis that iack 5-HT 
staining (n = 3-7 mice/group). 

(B) Quantitation of 5-HT+ ceii number per area of coionic epitheiiai tissue (n = 3-7 mice/group). 

(C) Quantitation of CgA+ ceii number per area of coionic epitheiiai tissue (n = 3-7 mice/group). 

(D) Ratio of 5-HT+ ceiis/CgA+ ceiis per area of coionic epitheiiai tissue (n = 3-7 mice/group). 

Data are presented as mean ± SEM. *p < 0.05, **p < 0.01 , ***p < 0.001 , ****p < 0.0001 . SPF, specific pathogen-free (conventionaiiy-coionized); GF, germ-free; 
CQNV., SPF conventionaiized; ABX, antibiotic-treated; Sp, spore-forming bacteria; PCPA, para-chiorophenyiaianine. 

See aiso Figure S2. 



To confirm that deficient 5-HT levels in GF mice are micro- 
biota-dependent and further determine whether effects are 
age-dependent, GF mice were conventionalized with an SPF 
microbiota at birth (postnatal day [P] 0), weaning (P21), or early 
adulthood (P42) and then evaluated at P56 for levels of 5-HT 
and expression of 5-HT-related genes. GF mice conventional- 
ized at each age with an SPF microbiota exhibit restored 
serum (Figure 1A) and colon (Figure 1B) 5-HT levels, with 
more pronounced effects seen at earlier ages of colonization. 
Colonic expression of TPH1 and SLC6A4 is similarly corrected 
by postnatal conventionalization of GF mice (Figures 1C and 
1 D), with more substantial changes from PO conventionaliza- 
tion. Increases in 5-HT are localized to colonic ECs (Figure 2). 
These findings indicate that postnatal reconstitution of the gut 
microbiota can correct the 5-HT deficiency seen in GF mice 
and further suggest that gut microbes exert a continuous ef- 
fect on 5-HT synthesis by modulating EC function. Overall, 
we demonstrate that microbiota-mediated elevation of host 
5-HT is postnatally inducible, persistent from the time of 



conventionalization and not dependent on the timing of host 
development. 

To assess the reversibility of microbial effects on host 5-HT 
metabolism, we depleted the gut microbiota in SPF mice via 
bi-daily antibiotic treatment beginning on PO, P21 , or P42 and 
until P56. Treatment of P42 SPF mice with a cocktail of ampi- 
cillin, vancomycin, neomycin, and metronidazole (Reikvam 
et al., 2011) sufficiently recapitulates GF-associated deficits in 
serum and colon 5-HT and alterations in host colonic TPH1 
and SLC6A4 expression (Figures 1 and 2). Interestingly, PO 
and P21 antibiotic treatment also induces GF-related deficits in 
colonic 5-HT, but the effects on serum 5-HT are more pro- 
nounced when administered at P42, compared to PO and P21 
(Figure 1), suggesting potential confounding effects of early life 
or prolonged antibiotic treatment on microbiota-mediated mod- 
ulation of peripheral 5-HT. Antibiotics can elicit several direct ef- 
fects on host cells (Shimizu et al., 2003; Westphal et al., 1994), 
which may underlie differences between PO treatment and 
GF status. That P42 antibiotic treatment of SPF mice results in 
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Figure 3. Indigenous Spore-Forming Bacte- 
ria Induce Colon 5-HT Biosynthesis and Sys- 
temic 5-HT Bioavailability 

(A) Levels of serum 5-HT. Data are normalized to 
serum 5-HT levels in SPF mice. SPF, n = 13; GF, n = 
17; GF+conv, P21 conventionalization, n = 4; 
SPF+Abx, P42 antibiotic treatment, n = 7; B. fragilis 
monoassociation (BF), n = 6; SFB, Segmented 
Filamentous Bacteria monoassociation, n = 4; ASF, 
Altered Schaedler Flora P21 colonization, n = 4; Sp, 
spore-forming bacteria, P21 colonization, n = 4; 
B. uniformis P21 colonization, n = 4; Bd, Bacter- 
oides consortium, n = 3. 

(B) Levels of colon 5-HT relative to total protein. 
Data are normalized to colon 5-HT relative to total 
protein in SPF mice (n = 5-15). 

(C) Levels of colon 5-HT relative to total protein after 
intrarectal treatment with the Tph inhibitor, PCPA, 
or vehicle (n = 4). 

(D) Colonic expression of TPH1 relative to GAPDH. 
Data are normalized to expression levels in SPF 
mice (n = 3). 

Data are presented as mean ± SEM. *p < 0.05, 
**p < 0.01, ***p < 0.001, ****p < 0.0001. n.s., not 
statistically significant; SPF, specific pathogen- 
free (conventionally-colonized); GF, germ-free; 
Sp, spore-forming bacteria; PCPA, para-chlor- 
ophenylalanine. 

See also Figure S3. 



5-HT phenotypes analogous to those seen in GF mice demon- 
strates that microbiota effects on host 5-HT can be abrogated 
postnatally and further supports the plasticity of 5-HT modula- 
tion by indigenous gut microbes. Altogether, these data indicate 
that the gut microbiota plays a key role in raising levels of colon 
and serum 5-HT, by promoting 5-HT in colonic ECs in an induc- 
ible and reversible manner. 

Indigenous Spore-Forming Microbes Promote Host 
Serotonin Biosynthesis 

In light of our finding that 5-HT levels are decreased in colons 
but not small intestines of GF mice compared to SPF controls, 
we hypothesized that specific subsets of gut microbes are 
responsible for affecting host 5-HT pathways. Mice mono- 
colonized with Bacteroides fragilis or Segmented Filamentous 
Bacteria (SFB) display deficits in serum 5-HT that are compara- 
ble to those seen in GF mice (Figure 3A). Moreover, postnatal 
colonization (P42) with Bacteroides uniformis, altered Schae- 
dler flora (ASF), an eight-microbe consortium known to correct 
gross intestinal pathology in GF mice (Dewhirst et al., 1999), or 
with cultured Bacteroides spp. from the SPF mouse microbiota, 
has no significant effect on the 5-HT deficiency seen in GF mice 
(Figures 3A and 3B). Interestingly, however, GF mice colonized 
at P42 with indigenous spore-forming microbes from the 
mouse SPF microbiota (Sp), known to be dominated by Clos- 
tridial species (Atarashi et al., 2013; Stefka et al., 2014) (Table 
S2), exhibit complete restoration of serum and colon 5-HT to 
levels observed in SPF mice (Figures 3A and 3B). Consistent 
with this, Sp colonization of GF mice increases 5-HT staining 



colocalized to CgA-i- ECs (Figure 2), elevates host colonic 
TPH1 expression (Figure 3D) and decreases SLC6A4 expres- 
sion (Figure 3E) toward levels seen in SPF mice. Improvements 
in serum 5-HT are observed within 2 days after inoculation of 
GF mice with Sp (Figure S2E) and do not correlate with amelio- 
ration of abnormal cecal weight (Figure S2F). Importantly, Sp 
also elevates colonic 5-HT in Ragi knockout mice (Figure S2G), 
which lack adaptive immune cells, indicating that the effects of 
Sp on gut 5-HT are not dependent on Sp-mediated regulatory 
T cell induction (Stefka et al., 2014). Notably, the 5-HT-promot- 
ing effects of Sp are recapitulated by colonization of GF mice 
with spore-forming microbes from the healthy human colonic 
microbiota (hSp) (Figure S3), suggesting that the serotonergic 
function of this community is conserved across mice and 
humans. 

To determine whether the effects of Sp on host 5-HT depend 
on colonic Tph activity, we colonized GF mice with Sp on 
P42 and then administered the Tph inhibitor para-chloropheny- 
lalanine (PCPA) intrarectally twice daily for 3 days prior to 5-HT 
assessments on P56 (Liu et al., 2008). Intrarectal injection of 
PCPA sufficiently blocks the ability of Sp to elevate colon 
and serum 5-HT levels (Figures 3C and S2C), as well as Sp-medi- 
ated increases in 5-HT staining in ECs (Figure 2). Similar effects 
of PCPA treatment on blocking increases in colon 5-HT, serum 
5-HT, and 5-HT staining in colonic ECs are seen in GF 
mice colonized with hSp (Figure S3). Interestingly, inhibiting 
Tph activity with PCPA results in a compensatory increase in 
colonic TPH1 and decrease in SLC6A4 (Figures 3D and S2D) 
expression in Sp-colonized mice, supporting the notion that 
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microbiota-dependent changes in 5-HT transporter levels occur 
as a secondary response to Tph modulation. 

To further evaluate whether changes in SLC6A4 expression 
are necessary for microbiota-mediated alterations in peripheral 
5-HT, we tested the effects of microbiota manipulations on colon 
and serum 5-HT in SLC6A4 heterozygous (+/-) and complete 
(-/-) knockout (KO) mice. Depleting the microbiota via P42- 
P56 antibiotic treatment (Reikvam et al., 2011) of SPF SLC6A4^^~ 
and SLC6A4~^~ mice effectively decreases colonic 5-HT levels 
(Figures S4A and S4B), indicating that the microbiota is required 
for promoting gut 5-HT in Slc6a4-deficient mice. Colonizing anti- 
biotic-treated SLC6A4^^~ and SLC6A4~^~ mice with Sp raises 
colon 5-HT to levels seen in untreated SPF SLC6A4^^~ and 
SLC6A4~^~ mice (Figure S4A), demonstrating that Slc6a4 is 
not required for conferring the effects of Sp on gut 5-HT. Antibi- 
otic-induced decreases and Sp-induced increases in colon 5-HT 
levels can be attributed to modulation of 5-HT content in colonic 
ECs from SLC6A4^^~ and SLC6A4~^~ mice (Figure S4C). Similar 
effects of antibiotic treatment and Sp colonization are seen for 
serum 5-HT in SLC6A4^^~ mice, whereas SLC6A4~^~ mice 
exhibit low to undetectable levels of serum 5-HT, highlighting 
the dependence of platelets on Slc6a4-mediated 5-HT uptake 
(Figure S4B). Taken together, these data support a role for Sp 
in promoting Tphi -mediated 5-HT biosynthesis by colonic 
ECs, regulating both colon and serum levels of 5-HT. 

Microbiota-Mediated Regulation of Host Serotonin 
Modulates Gastrointestinal Motility 

Intestinal 5-HT plays an important role in stimulating the enteric 
nervous system and Gl function (Gershon and Tack, 2007). To 
determine whether microbiota-dependent modulation of colonic 
5-HT impacts Gl motility, we colonized P42 GF mice with Sp and 
then tested for Gl transit and colonic neuronal activation at P56. 
Sp colonization ameliorates GF-associated abnormalities in Gl 
motility, significantly decreasing total transit time and increasing 
the rate of fecal output in a Tph-dependent manner (Figures 4A 
and 4B). Similar effects are seen in SLC6A4^^~ and SLC6A4~^~ 
mice, where Sp colonization of antibiotic-treated mice restores 
Gl transit time toward levels seen in untreated SPF SLC6A4'^^~ 
and SLC6A4~^~ controls (Figure S4E). 

Consistent with deficits in Gl motility, steady-state activation 
of 5-HT receptor subtype 4 (5HT4)-expressing cells in the colonic 
submucosa and muscularis externa is decreased in GF mice 
compared to SPF controls, as measured by colocalized expres- 
sion of 5HT4 with the immediate early gene, c-fos (Figures 4C- 
4E). Colonization of GF mice with Sp increases 5HT4+ c-fos+ 
staining to levels seen in SPF mice, and this effect is dependent 
on colonic Tph activity (Figures 4C-4E), which aligns well with 
the understanding that Sp-induced elevations in colonic 5-HT 
promote Gl motility by activation of 5HT4+ enteric neurons 
(Mawe and Hoffman, 2013). In addition, colonic activation of 
intrinsic afferent primary neurons (IPANs) of the myenteric plexus 
is decreased in GF mice (McVey Neufeld et al., 2013) and 
improved by colonization with Sp, as measured by colocalization 
of c-fos and the IPAN marker, calretinin (Calb2) (Figure 4F). Inhib- 
iting Tph activity with PCPA decreases IPAN activation in Sp- 
colonized mice, suggesting that some IPAN responses to Sp 
depend on host 5-HT synthesis (Figure 4F). Altogether, these 



findings indicate that Sp-mediated increases in colonic 5-HT 
biosynthesis are important for gut sensorimotor function. 

Microbiota-Mediated Regulation of Host Serotonin 
Modulates Platelet Function 

Platelets uptake gut-derived 5-HT and release it at sites of vessel 
injury to promote blood coagulation. To determine if microbiota- 
dependent modulation of colon (Figures 1 and 3) and plasma 
(Figure SI A) 5-HT impacts platelet function, we colonized P42 
mice with Sp and then examined blood clotting, platelet activa- 
tion and platelet aggregation at P56. In a tail bleed assay (Liu 
et al., 201 2), GF mice exhibit trending increases in time to cessa- 
tion of bleeding compared to SPF mice, suggesting impaired 
blood coagulation (Figure 5A). Colonization of GF mice with Sp 
ameliorates abnormalities in bleeding time to levels seen in 
SPF controls, and this effect is attenuated by intrarectal admin- 
istration of PCPA (Figure 5A), indicating that Sp-mediated im- 
provements in coagulation may be dependent on colonic Tph 
activity. Notably, the impact of acute colonic PCPA treatment 
on reducing 5-HT content and 5-HT-related functions in platelets 
may be tempered by the fact that mouse platelets have a lifespan 
of ~4 days (Odell and McDonald, 1961). There were no signifi- 
cant differences between treatment groups in total platelet 
counts (Figure S5A). 

In light of inherent limitations of the tail bleed assay (Liu 
et al., 2012), we focused subsequent experiments particularly on 
platelet activity. Platelets isolated from GF mice display decreased 
activation in response to in vitro type I fibrillar collagen stimulation, 
as measured by reduced surface expression of the activation 
markers granulophysin (CD63), P-selectin, and JON/A (integrin 
allbp3) (Figures 5D-5F) (Ziu et al., 2012). Sp colonization of GF 
mice leads to partial restoration in the expression of platelet acti- 
vation markers, and this effect depends on colonic Tph activity 
(Figures 5D-5F). Moreover, platelets isolated from GF mice exhibit 
impaired aggregation in response to in vitro collagen stimulation, 
as measured by decreased levels of high granularity, high mass 
aggregates detected by both flow cytometry (De Cuyper et al., 
2013; Nieswandt et al., 2004) (Figures 5B, 5C, S5C, and S5D) 
and imaging (Figure S5B). Colonization of GF mice with Sp re- 
stores levels of platelet aggregation to those seen in SPF mice. 
These effects of Sp on correcting impaired platelet aggregation 
are attenuated by colonic PCPA injection, indicating dependence 
on Tph activity. Overall, these findings suggest that Sp-mediated 
elevations in colonic 5-HT, and thus platelet 5-HT, promote 
platelet activation and aggregation relevant to hemostasis. 

Microbial Metabolites Mediate Effects of the Microbiota 
on Host Serotonin 

In light of the important role for Sp in regulating 5-HT-related in- 
testinal and platelet function, we aimed to identify specific micro- 
bial factors responsible for conferring the serotonergic effects of 
Sp. Based on our finding that Sp elevates 5-HT particularly in 
colonic ECs (Figure 2), we hypothesized that Sp promotes levels 
of a soluble factor that signals directly to ECs to modulate TPH1 
expression and 5-HT biosynthesis. To test this, we prepared fil- 
trates of total colonic luminal contents from Sp-colonized mice 
and controls and evaluated their effects on levels of 5-HT in 
RIN14B chromaffin cell cultures (Nozawa et al., 2009). Relative 
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Figure 4. Microbiota-Mediated Regulation of Host Serotonin Modulates Gastrointestinal Motility 

(A) Total time for transit of orally administered carmine red solution through the Gl tract (n = 4-8). 

(B) Defecation rate as measured by number of fecal pellets produced relative to total transit time (n = 4-8). 

(C) Representative images of c-fos and 5HT4 colocalization in the colonic submucosa and muscularis externa (n = 4-5 mice/group). 

(D) Quantitation of total c-fos fluorescence intensity in the colonic submucosa and muscularis externa (n = 4-5 mice/group). 

(E) Quantitation of total 5HT4 fluorescence intensity in the colonic submucosa and muscularis externa (n = 4-5 mice/group). 

(F) Quantitation and representative images of c-fos and calb2 (calretinin) colocalization in the colonic submucosa and muscularis externa (n = 5-8 mice/group). 
Data are presented as mean ± SEM. *p < 0.05, **p < 0.01 , ***p < 0.001 , ****p < 0.0001 . SPF, specific pathogen-free (conventionally-colonized); GF, germ-free; Sp, 
spore-forming bacteria; PCPA, para-chlorophenylalanine. 

See also Figure S4. 



to vehicle-treated controls, there is no significant effect of filtered 
colonic luminal contents from GF mice on levels of 5-HT released 
or TPH1 expressed from RIN14B cells (Figures 6A and 6B). 



Filtered colonic luminal contents from SPF and Sp-colonized 
mice sufficiently induce 5-HT from RIN14B cells (Figure 6A), to 
levels comparable to those elicited by the calcium ionophore, 
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Figure 5. Microbiota-Mediated Regulation of Host Serotonin Modulates Hemostasis 

(A) Time to cessation of bieeding in response to taii injury (n = 7-16). 

(B) Piateiet activation, as measured by percentage of large, high granularity (FSC^'^*^, SSC^'^*^) events after collagen stimulation relative to unstimulated controls 
(n = 3). 

(C) Representative flow cytometry plots of large, high granularity (FSC^'^*^, SSC*^'^^) activated platelets after collagen stimulation (bottom), as compared to un- 
stimulated controls (top) (n = 3). 

(D-F) Geometric mean fluorescence intensity of granulophysin (CD63) (D), P-selectin (E), and JON/A (integrin allb(33) (F) expression in collagen-stimulated 
platelets (left). Representative histograms (right) of event count versus fluorescence intensity (log scale) for platelets treated with collagen (red line) or vehicle (blue 
line) (n = 3). 

Data for platelet assays are representative of three independent trials with at least three mice in each group. Data are presented as mean ± SEM. *p < 0.05, **p < 
0.01, ***p < 0.001, ****p < 0.0001. n.s., not statistically significant; SPF, specific pathogen-free (conventionally-colonized); GF, germ-free; Sp, spore-forming 
bacteria; PCPA, para-chlorophenylalanine. 

See also Figure S5. 



ionomycin, as a positive control. TPH1 expression is also ele- 
vated in chromaffin cells exposed to SPF and Sp luminal filtrates, 
suggesting increased 5-HT synthesis. This is in contrast to 
ionomycin, which stimulates 5-HT release, but has no effect on 
TPH1 expression, from RIN14B cells. Importantly, these findings 
suggest that microbiota-mediated increases in gut 5-HT are 
conferred via direct signaling of a soluble, Sp-modulated factor 
to colonic ECs. 

We utilized metabolomic profiling to identify candidate Sp- 
dependent, 5-HT-inducing molecules in feces from adult mice. 
Sp colonization of GF mice leads to statistically significant alter- 
ations in 75% of the 41 6 metabolites detected, of which 76% are 
elevated and 24% are reduced, relative to vehicle-treated GF 
controls (Tables SI and S3). Similar changes are seen with 
hSp colonization, leading to co-clustering of Sp and hSp sam- 
ples by principal components analysis (PCA) (Figure 6C). ASF 
colonization has a mild effect, significantly modulating 50% of 
metabolites detected (66% increased, 36% decreased) (Table 
S3), and forming a distinct but proximal cluster to GF controls 



by PCA (Figure 6C). Postnatal conventionalization of GF mice 
with an SPF microbiota alters 66% of all metabolites detected 
(59% increased, 41 % decreased) (Table S3) and produces sub- 
stantial changes in the metabolome that are distinguishable from 
the effects of Sp, hSp, and ASF along PC2 (Figure 6C). Notably, 
Sp, hSp, and SPF colonization results in similar shifts along PCI , 
compared to vehicle and ASF-treated controls, suggesting com- 
mon metabolic alterations among communities that similarly 
elevate peripheral 5-HT levels. Metabolomics profiling confirms 
that fecal 5-HT is commonly upregulated in the Sp, hSp, and 
SPF fecal metabolome and comparatively low in ASF and GF 
samples (Table SI). Simple linear regression reveals 83 metabo- 
lites that co-vary with 5-HT (r^ > 0.25), 47 of which correlate 
positively and 36 of which correlate negatively with 5-HT levels 
(Figure S6A; Table S4). 

To determine whether specific metabolites mediate the effects 
of Sp on 5-HT, we tested a subset of biochemicals that were 
commonly upregulated by Sp, hSp, and SPF, and that positively 
correlated with 5-HT levels (Figure S6A; Table S4), for their ability 
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to induce 5-HT in vitro and in vivo. We also tested the short chain 
fatty acids, acetate, butyrate, and propionate, which were previ- 
ously shown to be produced by Sp (Atarashi et al., 2013) and to 
stimulate 5-HT release from ECs (Fukumoto et al., 2003). Of 16 
metabolites examined, a-tocopherol, butyrate, cholate, deoxy- 
cholate, p-aminobenzoate (PABA), propionate, and tyramine 
elevate 5-HT in RIN14B chromaffin cell cultures (Figure 6D). Ele- 
vations in 5-HT correspond to increases in TPH1 expression 
from RIN14B cells (Figure 6E), suggesting that particular metab- 
olites induced by Sp enhance 5-HT biosynthesis by ECs. We 
further tested for sufficiency to induce 5-HT in vivo. Notably, 
raising luminal concentrations of deoxycholate in colons of GF 
mice to levels seen in SPF mice (Sayin et al., 201 3) sufficiently in- 
creases colon and serum 5-HT compared to vehicle-injected 
controls (Figures 6F and S6B). This restoration of peripheral 
5-HT correlates with elevations in colonic TPH1 expression (Fig- 
ure 6F). Increases in colon and serum 5-HT are also seen with in- 
jection of a-tocopherol, PABA and tyramine into colons of GF 
mice (Figures S6B and S6C). Consistent with in vitro RIN14B 
data, oleanolate has no statistically significant effect on elevating 
colon or serum 5-HT in GF mice (Figures S6B and S6C). Impor- 
tantly, the effects of a single rectal injection of deoxycholate or 
a-tocopherol on raising colon 5-HT levels in GF mice are weak 
and transient, peaking within 1 hr of injection (Figure S6C). 
Consistent with this, there is no significant effect of acute colonic 
metabolite injection on Gl transit time (Figure S6D), and there is 
only a trending improvement on platelet activation (Figure S6E). 
Cur finding that Sp colonization leads to lasting increases in co- 
lon and blood 5-HT levels (Figure 3), and long-term changes in 
the fecal metabolome (Figure 6C; Tables SI and S3), suggests 
that Sp colonization results in persistent elevations of 5-HT- 
modulating luminal metabolites. Future studies on whether 
chronic, colon-restricted increases in Sp-regulated metabolites 
sufficiently correct Gl motility and platelet function in GF mice, 
and whether this occurs in a 5-HT-dependent manner, are war- 
ranted. In addition, we demonstrate that select concentrations of 
Sp-associated metabolites sufficiently promote 5-HT in vitro and 
in vivo, but whether the metabolites are necessary for mediating 
the serotonergic effects of Sp is unclear. Overall, these data 
reveal that indigenous spore-forming microbes promote 5-HT 
biosynthesis from colonic ECs, modulating 5-HT concentrations 
in both colon and blood. Furthermore, we identify select micro- 
bial metabolites that confer the serotonergic effects of indige- 
nous spore-forming microbes, likely by signaling directly to 
colonic ECs to promote Tphi expression and 5-HT biosynthesis. 

DISCUSSION 

The Gl tract is an important site for 5-HT biosynthesis, but the 
regulatory mechanisms underlying the metabolism of gut- 
derived 5-HT are incompletely understood. Here, we demon- 
strate that the gut microbiota plays a key role in promoting levels 
of colon and blood 5-HT, largely by elevating synthesis by host 
ECs. This host-microbiota interaction contributes to a growing 
appreciation that the microbiota regulates many aspects of Gl 
physiology by signaling to host cells. Whether particular mem- 
bers of the microbiota contribute 5-HT by de novo synthesis re- 
mains unclear. Some bacteria, including Corynebacterium spp.. 



Streptococcus spp., and Escherichia coii, are reported to syn- 
thesize 5-HT in culture (Roshchina, 2010), but this is believed 
to occur independently of Tph, by decarboxylation of tryptophan 
to tryptamine (Williams et al., 2014), as seen in plants (Cleskin 
et al., 1998). Cur finding that colonic PCPA administration blocks 
the ability of the microbiota to promote colonic and blood 5-HT 
(Figures 3C and 3D) suggests that gut microbes require host 
Tph activity to upregulate peripheral 5-HT. Furthermore, SPF 
Tphi KG mice lack >90% of intestinal and blood 5-HT levels (Sa- 
velieva et al., 2008), indicating that <10% of peripheral 5-HT is 
contributed directly by microbial synthesis or by Tph2-mediated 
biosynthesis in these mice. We find that the microbiota regulates 
relatively high levels of peripheral 5-HT, 64% of colonic (Figure 1 ), 
and 49% of serum concentrations (Figure 1 ) (Sjogren et al., 201 2; 
Wikoff et al., 2009), further supporting the notion that the micro- 
biota modulates 5-HT metabolism primarily by affecting host 
colonic ECs. Consistent with the understanding that ECs secrete 
low levels of 5-HT into the lumen, fecal concentrations of 5-HT 
are also significantly increased by the microbiota. Interestingly, 
5-HT is reported to stimulate the growth of Enterococcus faeca- 
iis, E. coii, and Rhodospiriiium rubrum in culture (Cleskin 
et al., 1998; Tsavkelova et al., 2006). In addition, 5-HT is a struc- 
tural analog of auxins found in E. faecaiis, R. rubrum, and Staph- 
yiococcus aureus, among other bacteria. Whether particular 
members of the microbiota alter host 5-HT biosynthesis to, in 
turn, support colonization, growth, or resilience of particular 
gut microbes is an interesting question for future study. 

We demonstrate that indigenous spore-forming microbes 
from colons of SPF mice (Sp) and from a healthy human 
colon (hSp) sufficiently mediate microbiota effects on colonic 
and blood 5-HT. While we show that B. fragiiis, B. uniformis, 
SFB, ASF, and a consortium of Bacteroides species cultured 
from mice, including B. thetaiotaomicron, B. acidifaciens, and 
B. vuigatus, have no effect on host peripheral 5-HT (Figure 3), 
whether other non-Sp microbial species or communities are 
capable of modulating colonic and serum 5-HT remains unclear. 
Interestingly, Sp and hSp are known to promote regulatory T cell 
levels in the colons, but not small intestines, of GF and SPF mice 
(Atarashi et al., 2013). This regional specificity is also seen with 
microbiota-induced 5-HT biosynthesis, which occurs in colonic, 
but not small intestinal, ECs (Figures SI A, S2A, and S2B). We 
find that Sp elevates colon 5-HT levels even in Ragi KG mice 
(Figure S2G), indicating that the serotonergic effects of Sp are 
not dependent on T and B cells. Whether 5-HT modulation con- 
tributes to the immunosuppressive effects of Sp, however, is un- 
clear. In light of increasing evidence that innate and adaptive im- 
mune cells express a variety of 5-HT receptors (Baganz and 
Blakely, 2013), future studies examining whether Sp-mediated 
increases in peripheral 5-HT levels impact cellular immune re- 
sponses will be of interest. 

Consistent with our finding that the microbiota modulates co- 
lon and serum 5-HT via interactions with host colonic ECs, we 
find that particular fecal metabolites are similarly elevated by 
SPF, Sp, and hSp microbiota and sufficiently promote 5-HT in 
chromaffin cell cultures and in vivo (Figure 6; Table SI). Deoxy- 
cholate is a secondary bile acid, produced by microbial biotrans- 
formation of cholate. In addition to facilitating lipid absorption, it 
has endocrine, immunological, and antibiotic effects and is 
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Figure 6. Microbial Metabolites Mediate Effects of the Microbiota on Host Serotonin 

(A) Levels of 5-HT released from RIN14B cells after exposure to colonic luminal filtrate from SPF, GF, and Sp-colonized mice, or to ionomycin (iono). Data are 
normalized to 5-HT levels in vehicle-treated controls (hatched gray line at 1). Asterisks directly above bars indicate significance compared to controls; asterisks at 
the top of the graph denote significance between experimental groups (n = 3). 

(B) Expression of TPH1 relative to GAPDH in RIN14B cells after exposure to colon luminal filtrate from SPF, GF and Sp-colonized mice, or to ionomycin (iono). 
Data are normalized to gene expression in vehicle-treated controls (hatched gray line at 1). Asterisks directly above bars indicate significance compared to 
controls, whereas asterisks at the top of the graph denote significance between experimental groups (n = 4). 

(C) Principal components analysis of the fecal metabolome from GF mice colonized with SPF, ASF, Sp, or hSp (n = 6). 

(D) Levels of 5-HT released from RIN14B cells after exposure to metabolites: acetate (1 mM), a-tocopherol (8 uM), arabinose (50 uM), azelate (50 uM), butyrate 
(1 00 uM), cholate (75 uM), deoxycholate (25 uM), ferulate (25 uM), GABA (25 uM), glycine (50 uM), N-methyl proline (0.5 uM), oleanolate (50 uM), p-aminobenzoate 
(1 uM), propionate (100 uM), taurine (50 uM), and tyramine (100 uM). Data are normalized to 5-HT levels in vehicle-treated controls (gray line at 1) (n = 5-19). 

(E) Expression of TPH1 relative to GAPDH in RIN1 4B cells after metabolite exposure. Data are normalized to expression in vehicle-treated controls (gray line at 1) 
(n = 3-4). 

(F) Levels of 5-HT in colons (left) and serum (center) of GF mice at 30 min after intrarectal injection of deoxycholate (125 mg/kg) or vehicle. Expression of TPH1 
relative to GAPDH (right) at 1 hr post injection (n = 3-8). 

(legend continued on next page) 
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reported to modulate the microbiota (Islam et al., 2011) and the 
severity of Clostridium difficile and Camphylobacter jejuni infec- 
tions (Buffie et al., 2014; Malik-Kale et al., 2008). Detrimental ef- 
fects are also observed; deoxycholate exhibits carcinogenic 
properties and is linked to various cancers (Bernstein et al., 
201 1 ; Yoshimoto et al., 2013). Notably, deoxycholate is reported 
to promote Gl motility by activating TGR5 G protein-coupled re- 
ceptors on ECs (Alemi et al., 2013), which is consistent with our 
finding that Sp-induced metabolites raise 5-HT levels in ECs and 
that Sp colonization improves Gl motility. Particular Clostridium 
species are known to possess high 7a-dehydroxylation activity 
required for the production of deoxycholate from cholate (Kita- 
hara et al., 2001; Narushima et al., 2006), which is in line with 
our finding that Sp microbes, comprised largely of Clostridia, in- 
crease deoxycholate levels. Deoxycholate concentrations are 
substantially higher in the colon versus small intestine (Sayin 
et al., 2013), which, coupled to the finding that bacterial load 
and diversity is greater in the colon versus small intestine (Se- 
kirov et al., 2010), could contribute to the regional specificity of 
microbiota-mediated increases in 5-HT synthesis to colonic 
ECs. Phylogenetic analysis of 16S rDNA sequences reveals 
that a subset of microbes recovered from Sp-colonized mice 
cluster taxonomically with known 7a-dehydroxylating Clostridia 
(Figures 6G and S7). Notably, there are striking phylogenetic 
commonalities between taxa identified in Sp- and hSp-colonized 
mice (Figure S7), consistent with their very similar luminal metab- 
olomic profiles (Figure 6C) and ability to promote 5-HT synthesis 
from colonic ECs (Figure S3). 

We also reveal that the metabolites a-tocopherol, tyramine, 
and PABA are elevated in feces by Sp. hSp or SPF colonization 
co-vary with fecal 5-HT levels and sufficiently induce 5-HT 
in vitro and in vivo (Figures 6 and S6; Table SI), a-tocopherol 
is a naturally abundant form of vitamin E, with reported thera- 
peutic effects for several diseases (Brigelius-Flohe and Trader, 
1999). Interestingly, patients with depression exhibit decreased 
plasma a-tocopherol (Maes et al., 2000; Owen et al., 2005), 
and treatment with a-tocopherol reduces depressive-like be- 
havior in pre-clinical models (Lobato et al., 2010), suggesting a 
link between a-tocopherol and 5-HT-related disease. Tyramine 
is a trace amine that acts as a neurotransmitter and catechol- 
amine-releasing agent. Particular bacteria can produce tyramine 
by decarboxylation of tyrosine in the gut, where tyramine is re- 
ported to stimulate fast ileal contractions and neuropeptide Y 
release (Marcobal et al., 2012). PABA is an intermediate of folic 
acid synthesis and essential nutrient for some bacteria. Partic- 
ular species can generate PABA from chorismate (de Crecy-La- 
gard et al., 2007), but physiological roles for PABA in the Gl tract 
are unclear. Subsets of microbes from Sp- and hSp-colonized 
mice relate phylogenetically to Clostridia with putative genes 
for a-tocopherol and tyrosine metabolism (Figures 6G and S7). 
Screening Sp microbes for target metabolic functions could 



serve as a tractable approach for further parsing the Sp con- 
sortium into the minimal species required for increasing 5-HT 
biosynthesis by ECs. 

While there is increasing evidence for a bi-directional rela- 
tionship between the gut microbiota and gut sensorimotor 
function, the particular microbes and mechanisms involved 
are unclear. The microbiota is required for normal IPAN excit- 
ability (McVey Neufeld et al., 2013), and recent studies reveal 
that changes in the microbiota can alter levels of neuroactive 
molecules, such as nitric oxide, substance P and endocannabi- 
noids, which have the potential to influence gut motor activity 
(Quigley, 2011). Mucosal immune responses (Collins, 1996), 
including key interactions between macrophages and enteric 
neurons (Muller et al., 2014), also modulate Gl motility via the 
gut microbiota. It will be interesting to determine whether 5- 
HT-mediated effects on immunity (Baganz and Blakely, 2013) 
contribute to its effects on Gl motility. Notably, deconjugated 
bile salts are reported to alter gut sensorimotor activity (Ap- 
pleby and Walters, 2014), which supports our hypothesis that 
Sp-induced increases in deoxycholate, among other metabo- 
lites, contribute to its ability to elevate colonic 5-HT and 
decrease intestinal transit time. 

While we demonstrate that Sp-mediated induction of colonic 
and blood 5-HT regulates Gl motility and platelet function in 
mice, further research is needed to explore additional implica- 
tions of microbially induced 5-HT on host health and disease 
(O’Mahony et al., 2015). Peripheral 5-HT modulates several 
cellular processes, including osteoblast differentiation, eryth- 
ropoiesis and immunity. Moreover, gross abnormalities in brain 
structure are observed in Tph1^^“ embryos from Tph1“^“ 
mothers (Cote et al., 2007), indicating that maternal peripheral 
5-HT is important for offspring neurodevelopment. Placentally- 
derived 5-HT also influences neurodevelopment, influencing 
thalamocortical axon guidance (Bonnin et al., 2011). Interest- 
ingly, the indigenous microbiota also modulates hippocampal 
levels of 5-HT (Clarke et al., 2013), revealing a role for the mi- 
crobiota in regulating the brain serotonergic system. Overall, 
our findings provide a mechanism by which select microbes 
and their metabolic products can be used to promote endoge- 
nous, localized 5-HT biosynthesis and further alter host 
physiology. 

EXPERIMENTAL PROCEDURES 

See Supplemental Information for additional details and references. 

PCPA T reatment 

At 2 weeks post-bacterial treatment, mice were anesthetized with isoflurane, 
and PCPA (90 mg/kg) (Liu et al., 2008) was administered intrarectally every 
12 hr for 3 days using a sterile 3.5 Fr silicone catheter inserted 4 cm into the 
rectum. Mice were suspended by tail for 30 s before return to the home 



(G) Phylogenetic tree displaying key Sp. (M) and hSp. (H) operational taxonomic units (OTUs) relative to reference Clostridium species with reported 7a-dehy- 
droxylation activity (red circles). Relative abundance is indicated in parentheses (n = 3). 

Data are presented as mean ± SEM. *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. n.s., not statistically significant; SPF, specific pathogen-free (conven- 
tionally-colonized); GF, germ-free; Sp, spore-forming bacteria; iono, 15 uM ionomycin; ASF, Altered Schaedler Flora; hSp, human-derived spore-forming 
bacteria. 

See also Figures S6 and S7. 
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cage. For mock treatment, mice were anesthetized and intrarectaiiy injected 
with steriie water as vehicie. 

Serotonin Measurements 

Serotonin ieveis were detected in sera and supernatant of tissue homogenates 
by ELISA according to the manufacturer’s instructions (Eagle Biosciences). 
Readings from tissue samples were normalized to total protein content as de- 
tected by BCA assay (Thermo Pierce). Data compiled across multiple experi- 
ments are expressed as 5-HT concentrations normalized to SPF controls 
within each experiment. 

RIN14B In Vitro Cuiture Experiments 

RIN14B cells (ATCC) were seeded at 10^ cells/cm^ and cultured according to 
methods described in Nozawa et al. (2009). Total colonic luminal contents 
were collected from adult SPF, GF, and GF mice colonized with spore-forming 
bacteria, suspended at 120 |al/mg in HBSS supplemented with 0.1 % BSA and 
2 uM fluoxetine, and centrifuged at 12,000 x g for 10 min. Supernatants were 
passed through 0.2 urn pore syringe filters. Cultured RIN14B cells were incu- 
bated with colonic luminal filtrate for 1 hr at 37°C. 

Gl Transit Assay 

Mice were orally gavaged with 200 i^l sterile solution of 6% carmine red (Sigma 
Aldrich) and 0.5% methylcellulose (Sigma Aldrich) in water and placed in a new 
cage with no bedding (Li et al., 2011). Starting at 120 min post-gavage, mice 
were monitored every 10 min for production of a red fecal pellet. Gl transit 
time was recorded as the total number of minutes elapsed (rounded to the 
nearest 10 min) before production of a red fecal pellet. For mice treated intra- 
rectaiiy with PCPA or metabolites, Gl transit assay was conducted 1 hr after 
the third injection. 

Platelet Activation and Aggregation Assays 

Blood samples were collected by cardiac puncture, diluted with a 2 x volume 
of HEPES medium and centrifuged through PST lithium hepararin vacu- 
tainers (Becton Dickinson). Expression of platelet activation markers was 
measured by flow cytometry (Nieswandt et al., 2004; Ziu et al., 2012). Platelet 
aggregation assays were conducted according to methods described in (De 
Cuyper et al., 2013). Remaining unstained PRP was used to generate PRP 
smears. Slides were stained with Wright Stain (Cameo) according to stan- 
dard procedures. 

16S rRNA Gene Sequencing and Analysis 

Fecal samples were collected at 2 weeks after orally gavaging GF mice with Sp 
or hSp. Bacterial genomic DNA was extracted from mouse fecal pellets using 
the QIAamp DNA Stool Mini Kit (QIAGEN). The library was generated accord- 
ing to methods from (Caporaso et al., 201 1). The V4 regions of the 16S rRNA 
gene were PGR amplified, purified and then sequenced using the lllumina 
MiSeq platform. Operational taxonomic units (OTUs) were chosen de novo 
with UPARSE pipeline (Edgar, 2013). Taxonomy assignment and rarefaction 
were performed using QIIME1 .8.0 (Caporaso et al., 2010). Phylogenetic trees 
were built using PhyML (Guindon et al., 2010) and visualized using ITOL (Le- 
tunic and Bork, 2007). 

See also Extended Experimental Procedures. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Discussion, Extended Experi- 
mental Procedures, seven figures, and four tables and can be found with 
this article online at http://dx.doi.Org/10.1016/j.cell.2015.02.047. 
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SUMMARY 

Coordinated organ behavior is cruciai for an effec- 
tive response to environmentai stimuii. By studying 
regeneration of hair follicles in response to patterned 
hair piucking, we demonstrate that organ-ievel 
quorum sensing allows coordinated responses to 
skin injury. Plucking hair at different densities leads 
to a regeneration of up to five times more neigh- 
boring, unplucked resting hairs, indicating activation 
of a collective decision-making process. Through 
data modeling, the range of the quorum signai was 
estimated to be on the order of 1 mm, greater than 
expected for a diffusibie molecuiar cue. Molecuiar 
and genetic analysis uncovered a two-step mecha- 
nism, where reiease of CCL2 from injured hairs leads 
to recruitment of TNF-a-secreting macrophages, 
which accumuiate and signal to both plucked and 
unplucked follicles. By coupling immune response 
with regeneration, this mechanism aiiows skin to 
respond predictively to distress, disregarding mild 
injury, while meeting stronger injury with fuii-scale 
cooperative activation of stem ceiis. 

INTRODUCTION 

The effective coordination of organ behavior, either under phys- 
iological conditions or as a response to injury, is essential for sur- 
vival. Integration at the level of large-scale organ systems has 

CrossMark 



been extensively studied, but the role of shorter range, local co- 
ordination has not. For example, is the regeneration of repeated 
tissue units within an organ (e.g., hair follicles [HFs] in skin, villi in 
intestine) coordinated so as to achieve collective decision-mak- 
ing? If so, what are the mechanisms of communication, and how 
is information integrated? In particular, if injury or malfunction af- 
fects only a subset of tissue units in an organ, how is such a col- 
lective decision made whether to mount a response that is local 
(e.g., local repair) or global (e.g., tissue level regeneration)? 

Mammalian skin offers an excellent platform to address such 
questions, because its numerous HFs behave as discrete, re- 
peating, semi-autonomous tissue units (Jahoda and Christiano, 
201 1 ) distributed on a 2D plane. HFs undergo cyclic regeneration 
(Paus et al., 1998) by regulating both intra- and extra-follicular 
cues for hair stem cell activation (Stenn and Paus, 2001 ; Plikus 
et al., 2008, 2011; Festa et al., 2011; Chen and Chuong, 2012), 
both during physiological regeneration and in response to injury 
(Chuong et al., 2012). The experimental accessibility of HFs 
makes them an ideal model to study collective decision making 
in an organ population in vivo. 

Classical studies show that hair plucking produces a micro- 
injury that can potentially lead to hair regeneration (Collins, 
1918; Silver and Chase, 1970). This process is thought to be 
mediated by an autonomous mechanism in each follicle, in which 
early apoptosis in the bulge leads to activation of hair germ pro- 
genitors (Ito et al., 2002). Here, we uncover evidence that the de- 
cision of hair stem cells to be activated or remain quiescent also 
depends on information coming from neighboring follicles. This 
possibility was first suggested by our earlier study in which 
plucking fewer than 50 refractory telogen hairs did not induce 
hair regeneration, while plucking more than 200 hairs did (Plikus 
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et al., 2008). Here, by varying the spacing, arrangement, and 
shapes of plucked regions, we unexpectedly found that plucking 
200 hairs, with a proper topological distribution can cause up to 
1,200 hairs to regenerate. These results demonstrate marked 
non-autonomy in HF regeneration and a distinctly non-linear 
quantitative relationship between plucking and regeneration. 

As discussed below, the collective HF response to injury may 
be seen as an example of quorum sensing, a form of social 
behavior in which population decisions depend on the density 
of signaling individuals within a given spatial territory (Bassler, 
2002; Pratt, 2005). In order to gain insights into the possible mech- 
anisms underlying this behavior, we first used mathematical 
modeling to identify the characteristic spatial range over which 
quorums are sensed, which led us to suspect that the signaling 
mechanism consists of more than just a diffusible molecule. The 
time course of molecular changes after plucking, together with 
the results of genetic and pharmacological manipulation, impli- 
cated a two-stage mechanism, involving the release of diffusible 
signals that recruit immune cells (Ml macrophages) which then 
actively spread among follicles, where they locally induce regen- 
eration through the release of substances such as Tnf-a. 

This work identifies a mechanism for quorum sensing that op- 
erates on the millimeter scale to coordinate the behaviors of 
semi-autonomous tissue units within an organ. Such coordina- 
tion enables the skin to condition its responses to the spatial 
extent of injuries, launching a full scale regenerative response 
only when a sufficient threshold is reached. 

RESULTS 

Topology-Dependent Hair Plucking Can Induce the 
Regeneration of More Hairs Than Were Plucked by 
Activating Neighboring, Unplucked Follicles 

To gain insight into the mechanisms leading to hair renewal 
following follicle injury, the relationship between hair plucking 
density and regeneration were examined in the mouse. To stan- 
dardize the experiments, we synchronized all the dorsal pelage 
HFs into refractory telogen before plucking (see Extended 
Experimental Procedures). Normal hair density in adult C57BL7 
6 mouse dorsal skin is ~45-60 hairs/mm^, corresponding to a 
distance between each follicle of ~0.15 mm (Figure S1E). In 
the first set of experiments, 200 evenly distributed refractory tel- 
ogen hairs were plucked within a circular skin area (the “injury 
field”; Figures IB and 1C, red circle). By plucking a constant 
number of hairs but altering the size of the injury field (Figures 
1A, IB, and SI), plucking densities from 2-50 hairs/mm^ were 
obtained (Figures 1 and SI; Extended Experimental Proce- 
dures). We then studied the regenerative behavior of the HFs. 

We observe three types of responses (Figure 1 F). First, if 200 
hairs were plucked in a large area (>6 mm diameter, 28.3 mm^, 
plucking density <1 0 hairs/mm^) (Figures 1 B and SI ), no regener- 
ation of plucked or unplucked follicles occurs even after 30 days 
(Figures 1 A, IB, and SI). This is because the plucking density is 
too low and does not generate accumulated signals above the 
threshold level (Figure IF, zone of very low density plucking, 
gray area). Second, when 200 hairs are plucked from 3-, 4-, or 
5-mm diameter circular areas (plucking density >10 hairs/mm^, 
the threshold density), we induce a simultaneous regeneration 



of the whole region (including the plucked and surrounding 
unplucked follicles) (Figures 1C and ID). Thus, by plucking only 
200 hairs, the eventual regeneration of approximately 450, 780, 
or 1 ,300 hairs are obtained (with 200 hairs plucked in injury field 
sizes of 3, 4, and 5 mm in diameter, or 7.1 , 12.6, and 19.6 mm^, 
respectively) (Figures 1C-1F, SI, and S2). As an example, we 
can induce regeneration of up to 600 unplucked hairs within a 
5-mm plucked region (zone of quorum sensing-dependent hair 
regeneration, orange/light green area in Figure 1 F) and 400 hairs 
outside of the plucked region, resulting from propagation. Third, 
when 200 hairs are plucked from a 2.4-mm diameter region (high 
density, 1 00% plucking), every follicle in the field is plucked (Fig- 
ure 1A). In this case, all follicles re-entered anagen ~12 days 
(1 2.3 ± 3.37, n = 1 3) after plucking, and the number of regenerat- 
ing follicles equals the number of plucked follicles (zone of all fol- 
licles plucked, dark green). Plucking-dependent regeneration 
from refractory telogen requires the plucking of at least 50 follicles 
to reach the basal threshold (Plikus et al., 2008). This zone 
is equivalent to the frequently used wax stripping procedure 
(Muller-Rover et al., 2001) in which melted wax was used to strip 
away all follicles in a large region, usually centimeters in diameter 
or bigger. This method involves thousands of HFs which will 
regenerate in synchrony without using quorum sensing. 

The Hair Follicle Population as a Quorum-Sensing 
System 

The density-dependence of regeneration, together with the 
simultaneous regeneration of both plucked and unplucked 
follicles within the injury field, suggests that plucked follicles pro- 
duce a signal that (1) spreads to neighboring follicles, (2) accu- 
mulates to a level that depends upon the density and position 
of other plucked follicles, and (3) when present above some 
threshold level will trigger any follicle— plucked or unplucked — 
to re-enter anagen. 

The idea that HFs produce signals that affect other HFs can be 
inferred from the coordinated waves of hair cycling that travel 
across the skin of mice and rabbits (Plikus et al., 2008, 2011). 
Yet the signals that coordinate such “hair waves” cannot explain 
the collective regenerative responses seen here, at least not 
those within the injury field itself. This is because hair waves 
reflect the ability of follicles in anagen to accelerate the progres- 
sion of neighboring telogen follicles into anagen, whereas pluck- 
ing causes injured and uninjured follicles to progress from refrac- 
tory telogen to regenerate collectively and simultaneously (i.e., 
regeneration is not driven by neighboring anagen follicles). Just 
outside the plucked injury fields (e.g.. Figure 1C, outside of the 
red circle), however, the ring of delayed regeneration likely re- 
flects the “hair wave” phenomenon, since follicles in this zone 
enter anagen only after regeneration in neighboring follicles is 
well underway. To avoid confusion between initial, collective 
regeneration and later hair wave spreading, the present study fo- 
cuses exclusively on early regenerative events. 

One way to gain insight into the nature of the quorum signal— 
that we shall initially call the “distressor” —that coordinates collec- 
tive regeneration is to characterize its decay length, i.e., the 
characteristic spatial scale over which the strength of the signal 
decays. Decay lengths quantify the balance between the rate at 
which a signal spreads and the rate at which it is destroyed or 
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Figure 1. Plucking-Induced Hair Regenera- 
tion Is a Population-Based Behavior that 
Depends on the Density and Distribution of 
Plucked-Hair Follicles within the Unplucked 
Follicle Population 

(A and B) Plucking 200 hairs from a circular 2.4 mm 
in diameter area (100% plucking) leads to hair 
regeneration 12 days later. Plucking 200 hairs in a 
12 mm diameter area (100 mm^ area; low density 
plucking) fails to induce follicle regeneration even 
30 days later. 

(C) Plucking induces regeneration of all follicles 
(the 200 plucked and 600 unplucked) within the 
plucked area (red circle, 5 mm in diameter). 
Unplucked follicles (400 HFs in total) outside the 
plucked area boundary then regenerate due to hair 
wave propagation (blue circle). 

(D) High power view showing unplucked follicle 
regeneration: the old gray club hair (yellow) is 
pushed out by the regenerating black anagen hair 
(red). 

(E) In this schematic drawing, gray dots represent 
telogen HFs. Black lines encircle exemplary 
plucked regions. Plucked follicles (purple dots). 
Regenerating plucked HFs (green dots). Re- 
generating unplucked HFs (tan dots). 

(F) Plot showing the hair regeneration response 
versus the size of the plucked field. For all different 
field sizes, 200 hairs are plucked evenly dispersed 
throughout the field. A regenerative response is 
observed when 200 hairs are plucked at a 
density above a threshold (10 hairs/mm^), which 
corresponds to plucking 200 hairs from a 5-mm 
diameter circular surface area (red line). Three re- 
sponses represented by different colors (gray, tan, 
green), are observed (please see text for explana- 
tion). The quorum sensing zone is highlighted in 
orange. 

See also Figures SI and S2. 
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removed. Diffusible molecules that are captured by high-affinity, 
cell surface receptors (e.g., morphogens, growth factors, cyto- 
kines, and chemokines) tend to have relatively short decay 
lengths, typically on the order of no more than 100 iim (Teleman 
and Cohen, 2000; Muller et al., 2012; Sarris et al., 2012; Weber 
et al. , 201 3; Shimozono et al. , 201 3), approximately the same scale 
as the inter-follicular distance in mouse skin. In contrast, as we 
show in the next section, the decay length of the putative distres- 
sor induced by plucking appears to be substantially larger— on the 
order of 1 mm, or four to six inter-follicular distances. 

Estimating the Range of Action of the Quorum Signal 

Regardless of the physical nature of a signal (i.e., if it spreads 
from cell to cell via an undirected random walk), its spread can 
usually be modeled as a diffusion process, and it will display a 
decay length, X, equal to the square root of the ratio between 
its diffusivity (an intrinsic measure of how fast it moves) and the 
rate constant that characterizes its removal or destruction in 
the tissue through which it spreads (Lander, 2007) (a physical 
interpretation of X is the distance over which the steady-state 
signal from a point source falls by a factor of 1-1/e, or ~63%). 

The results of modeling plucked hairs as an array of point sour- 
ces of a diffusible distressor in a 2D medium (Figure 2A; see also 
Extended Experimental Procedures) tell us that the expected 
steady-state distressor concentration should be a function not 
only of plucking density, but also of injury field size, shape, and 
X. For example, with a constant plucking density, concentrations 
of distressor should rise as a function of field size/X, leveling off 
as that ratio gets large (Figure 2A). Assuming that regeneration is 
triggered when the distressor concentration around an HF ex- 
ceeds a certain threshold, these results suggest that one could 
estimate the value of X from a series of experiments in which 
plucking density and injury size/shape are both varied. 

For example, in Figure 2B, data on whether regeneration in cir- 
cular injury fields occurred (green dots) or failed (red dots) was 
tabulated as a function of field radius and plucking density 
(plotted, in this case, as the inverse of the plucked fraction). Fitting 
the boundary between positive and negative data to the predic- 
tions of the steady-state diffusion model yields estimates of X be- 
tween 0.6 and 1 .6 mm (Extended Experimental Procedures). 

The same model also predicts that, for sufficiently large 
plucked fields and/or sufficiently high plucking densities, imme- 
diate regenerative responses should not be limited to the precise 
boundaries of the injury field, but should extend a small distance 
beyond those boundaries (here we refer only to regeneration that 
occurs at the same time as that within the injury field and not 
what is triggered significantly later by hair wave propagation). 
Careful examination of experimental data showed that a small 
rim of early regeneration indeed occurred just outside of some 
injury fields. Fitting the sizes of these rims to the model (Fig- 
ure 2C) yields an independent estimate for X = 1 mm. 

Finally, the same diffusion model suggests that X can also be 
estimated by holding both plucking density and injury field area 
constant, but varying the shape of the injury field. To test this pre- 
diction, experiments were carried out in which 50 hairs were 
plucked evenly at a density of every other hair, either in a straight 
line (Figure 2D), a narrow rectangle (6:1 aspect ratio; Figure 2E) or 
a square (Figure 2F). Under these distinct topological conditions. 



plucked single rows never regenerated, while squares always re- 
generated robustly. Rectangles occasionally exhibited modest 
regeneration, suggesting a distressor concentration very close 
to threshold under these circumstances. Fitting these three be- 
haviors requires a value of X between 0.7 and 1 .2 mm. 

The good agreement among these three methods supports 
the validity of the steady-state diffusion model for describing dis- 
tressor spreading and places the value of X at ~1 mm. Fitting to 
the model does not imply, however, that the distressor is a single 
substance, or even a diffusible molecule, but simply that it 
spreads according to the same rules. In fact, the observed 
magnitude of X suggests that the distressor is not simply a diffus- 
ible receptor-binding molecule, since these typically display 
decay lengths of one tenth this magnitude or less (Teleman 
and Cohen, 2000; Muller et al., 2012; Sarris et al., 2012; Weber 
et al., 201 3; Shimozono et al., 201 3). As described below, further 
investigation of the molecular nature of the distressor signal sup- 
ports the idea that it consists of both diffusible molecules and re- 
cruited cells that migrate actively between follicles. 

Plucking Induces a Cascade of Inflammatory, Cellular, 
and Molecular Events 

Results from wax-stripping experiments indicate that HF kerati- 
nocytes undergo apoptosis ~4 hr after injury (Ito et al., 2002; see 
also Figure 3A). To identify molecules and mechanisms that 
might be involved in plucking-induced regeneration, we carried 
out microarray analysis of plucked fields at 12, 24, 48, and 
96 hr after injury. Among the notable, time-dependent changes 
in gene expression, we observed: 

(1) Transient increase in expression of pro-inflammatory 
cytokines. Immune, inflammatory and wound healing 
response genes constitute the major portion of early tran- 
scriptional activity following plucking. Analyzing the most 
altered genes by RT-PCR, we found that immune cyto- 
kines, chemokine (C-C motif) ligand 2 (CCL2), chemokine 
(C-X-C motif) ligand 2 (CXCL2), and interleukin 1, beta 
(IL-1p) were upregulated soon after plucking (i.e., 12 hr), 
although expression of these genes peaked at different 
times (Figure 3B). For example, CCL2 expression peaked 
around 12 hr (Figure 3B). 

(2) Reduced refractory telogen inhibitor expression. During 
refractory telogen, the extra-follicular macro-environment 
expresses high levels of inhibitors, including Bmp2, Dick- 
kopf(Dkk1), and soluble frizzled related protein (Sfrp4) that 
block anagen re-entry and hair wave propagation (Plikus 
et al., 2008, 2011). Expression of Sfrp4, a representative 
gene, decreased markedly at day 1 but rebounded by 
days 2 and 4 (Figure 3B). 

(3) Increased tumor necrosis factor alpha (Tnf-a) expression. 
Tnf-a increases between 1-2 days after plucking and 
reached a plateau at ~day 2 (Figures 3B and 3C). The 
plateau of Tnf-a expression corresponds to a time when 
activated hairs are in early anagen phase (Figure 3C). 
We also observe changes of other molecular pathways. 
For example, platelet derived growth factor A (Pdgf-a) 
increased at later stages after plucking (day 4), compat- 
ible with published results (Festa et al., 2011) (Figure 3B). 
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Figure 2. Mathematical Modeling Identifies the Decay Length of a Putative Quorum Signal 

(A) Calculated steady-state concentrations for a diffusible substance produced within injury fields in proportion to the numbers of plucked HF. Each curve 
represents a different sized circular injury field, with the red circle placed at the value on the abscissa corresponding to the injury field radius, in units of the 
diffusing substance decay length. Specifically, the 1 1 curves represent increasing field sizes of 0.25, 0.5, 1 , 1.5, 2, 3, 4, 5, 6, 7, and 8 decay lengths. As plucked 
regions grow larger, the value at the boundary asymptotes to one half the value at the center. X is a decay length. Please see Results and Supplemental In- 
formation for more explanation. 

(B) Data from a variety of regeneration experiments involving circular wound fields are plotted as a function of the inverse of the plucked fraction (cp) and the radius 
of the wound field. The curve drawn between the points corresponding to cases of successful (green) and unsuccessful (red) regeneration was obtained from the 
equations that produced the curves in (A), by fitting two parameters, the decay length and the threshold concentration for regeneration. The range of possible 
values consistent with the data was manually explored to yield a range of decay length estimates. 

(C) The same model was used as in (B), but the data that were fit consisted of the distances, 6, just beyond the edges of injury fields at which initial regeneration 
was seen, (p, radius of the injury field; cp, the plucked fraction). The plotted surface represents a least-squares best fit to the data. 

(D-F') Effects of injury field shape. Fifty hairs were plucked evenly, at a density of every other hair, either in a straight line (D and D'), a narrow rectangle (6:1 aspect 
ratio; E and E') or a square (F and F'). In (D')-(F'), a discrete form of the equation used in (A)-(C), in which each HF is modeled as a discrete source, was used to plot 
the steady-state spatial distributions of a distressor released by plucked follicles (distances are plotted in units of the inter-follicular distance, -^0.15 mm). 
Wherever plotted surfaces extend above a regeneration concentration threshold (gray plane), red dots mark the location of each HF indicating successful 
regeneration. The requirement that these curves be consistent with the observed regeneration patterns in all three cases was sufficient to provide yet a third 
estimate of the distressor decay length. See also Supplemental Information on mathematical model. 



Since the Wnt pathway is critical for hair growth (Enshell- 
Seijffers et al., 2010; Lowry et al., 2005), we examined the 
expression of Wnt pathway members, using whole mount 
in situ hybridization, and compared their expression patterns, 
over time, with those of Tnf-a. Wnt6, ^-catenin, and lymphocyte 
enhancer factor (Lef-1) were upregulated within new anagen 
follicles at day 4, but not in the extra-follicular dermal macro- 
environment (Figure S3). We also localized Tnf-a expression 



to the extra-follicular dermal macro-environment (Figures 3C 
and S3). 

CCL2 Is a Key Component of the Quorum Signal 

The earliest noted signaling molecule expression change that 
could potentially communicate information from plucked to un- 
plucked follicles was CCL2 (Figure 3B). Immunohistochemistry 
showed that CCL2 is primarily produced by HF keratinocytes 
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and accumulates predominantly in plucked follicles (where 
apoptosis occurs), and to a much smaller extent in neighboring 
unplucked follicles, which do not undergo apoptosis (Figures 
4A, 4B, and S4). This induction is transient and diminishes at 
day 5. CCL2 induction in plucked HFs occurred regardless of 
plucking density, so CCL2 was expressed even at the densities 
that failed to launch regeneration. Epidermal staining of CCL2 is 
evident in the 2.4 mm specimen, some is seen surrounding the 
plucked follicle in the 5 mm specimen, but staining is sparse in 
the 8 mm specimen. 

These results are consistent with CCL2 expression providing 
an overall measure of the extent of plucking and therefore poten- 
tially serving as a quorum signal. To test whether CCL2 function 
is required for follicle regeneration, we waxed whole back skin 
from both wild-type C57BLV6 and CCL2 null mice. In contrast 
to the localized plucking of 200 hairs (that induces hair regener- 
ation after ~12 days) (Plikus et al., 2008), wax-stripping the 
whole back skin drives telogen hairs back into full anagen (ana- 
gen VI) within ~6 days (Mu Her- Rover et al., 2001). However, 



Figure 3. Identification of Macro-Environ- 
mental Modulators following Hair Plucking 

(A) TUNEL assay to measure apoptosis. 

(B) Real-time PGR from extra-follicular macro- 
environmental tissues revealed the kinetics of gene 
expression induced by plucking (normalized to 
GAPDH with 40 cycles, data are represented as 
mean ± SD, n = 3). 

(C) Whole mount in situ hybridization showed that 
Tnf-a is markedly upregulated in the inter-follicular 
area beginning 2 days after wax stripping. 

See also Figures S3 and S4. 



when the backs of CCL2 null mice were 
wax-stripped, follicles remained in telo- 
gen 3 days after waxing and were still 
in anagen III to IV at day 6 (Figures 4C 
and S5A). This delayed hair regrowth in 
CCL2 null mice following plucking sup- 
ports a role for CCL2 in plucking induced 
hair regeneration. 

TUNEL staining performed 1 day after 
plucking revealed that both wild-type 
and CCL2 null mice showed apoptotic 
HF cells (Figures 4D and S5B). These re- 
sults indicate that CCL2 is not required 
for the initial injury response of HFs, but 
rather its expression is triggered by that 
response, whereupon it plays an impor- 
tant role in regeneration. This view is 
consistent with a recent study showing 
that various HF regions express chemo- 
kines including CCL2, CCL20, and CCL8 
in response to stress (Nagao et al., 
2012a). The percent HF area with CCL2 
expression was highest 1 day after pluck- 
ing and decreased thereafter (Figure 4E). 
Unplucked follicles located within (x) or 
outside (y) of the plucked field showed 
low and no CCL2 levels. CCL2 null mice did not express CCL2 
after plucking (z). 



Ml Macrophages Are Mediators Recruited by CCL2 to 
Execute Quorum-Sensing Behavior 

The decay lengths of most diffusible signaling molecules, in- 
cluding chemokines (Sarris et al., 2012; Weber et al., 2013), are 
much shorter than the decay length we measured for the pluck- 
ing-induced quorum signal (Figure 2). Chemokines, however, 
are known to act as chemo-attractants for immune cells, and 
we postulated that this might play a role in boosting the effective 
range of action of an initial quorum signal. CCL2 in particular is a 
potent recruiter of monocyte/macrophage lineage cells. 

Indeed, 2 days after plucking, macrophages had heavily infil- 
trated the plucked skin (Figure 5A). We quantified the macro- 
phage distribution at different times after plucking (Figure 5E). 
At day 1 , F4/80 positive macrophages accumulate around and 
between the plucked follicles. At day 3, more macrophages 
spread to the inter-plucked follicular regions and their density 
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Figure 4. CCL2 Is Involved in Plucking 
Induced Hair Regeneration 

(A) HF keratinocytes showed higher CCL2 ex- 
pression (green) in piucked foiiicies (red arrow) 
than in unpiucked foiiicies (white arrowhead). The 
circie with purpie dots indicates the topoiogy of 
piucked foiiicies (see aiso Figure 1E). Peak 
expression occurs 1-3 days after piucking, and 
no marked difference between the 2.4, 5, and 
8 mm groups were noted. Asterisk represents re- 
generating HFs. 

(B) Doubie immunostaining for K14 and CCL2 of 
sampies 3 days after piucking showed that HF 
keratinocytes in piucked foiiicies are the main 
source of CCL2. 

(C) Hair re-growth is retarded when hairs were 
piucked from CCL2 nuii mice. 

(D) CCL2 nuii mice showed simiiar apoptotic HF 
ceiis foiiowing piucking as wiid-type mice, but 
couid not induce CCL2 in apoptotic HF ceiis. 

(E) Graph showing the percentage of HF area ex- 
pressing CCL2 at 1 , 3, 5, and 7 days post-piucking 
as weii as unpiucked HFs within (x) and outside 
(y) of the piucked fieid. CCL2 nuii mice do not ex- 
press CCL2 (z). (n = 3). Data are represented as 
mean ± SD. 

See aiso Figure S5. 
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is substantially elevated (at least four times over background) up 
to 66% of that found for plucked follicles. The spread can span a 
distance of 1 mm. These macrophages start to dissipate at day 7 
post-plucking. 

To test whether macrophages play a functional role in pluck- 
ing-induced hair regeneration, we used chemical inhibitor and 
genetic deletion assays. The application of Clodronate lipo- 
somes to suppress macrophage function caused an ~1 2-day 
delay in hair plucking induced regeneration (Figure 6F). For the 
genetic approach, we used LysM-Cre;R26R transgenic mice to 
examine the distribution of LacZ positive myeloid cells following 



hair plucking. Myeloid cells are nearly ab- 
sent in normal mice, but are induced at 
days 3-5 and diminish at day 7 in the 
transgenic mice (Figure 5F). 

To evaluate their role, we generated a 
triple transgenic mouse model where 
myeloid cells are specifically depleted by 
diphtheria toxin upon doxycycline treat- 
ment (LysM-Cre;Rosa-rtTA;TetO-DTA). 
We plucked 200 hairs/5 mm diameter re- 
gion, which usually launches a quorum 
sensing response, leading to regenera- 
tion. In this mutant, hair regeneration did 
not occur (Figures 5G and S2D). These 
myeloid cells represent mainly macro- 
phages, although technically we cannot 
rule out other cell types completely. All 
together, the data suggest macrophages 
play a major role in this process. 

Macrophages can be divided into two 
major types; M1 macrophages (classi- 
cally activated) exert proinflammatory activities, and M2 macro- 
phages (alternatively activated) are involved in resolving inflam- 
mation (Gordon, 2003; Willenborg et al., 2012). Immunostaining 
showed that Ml, but not M2 macrophages, were present 
5 days post-plucking (Figures 5B, 5C, and S6A). 

These findings are consistent with other studies implicating 
chemokines in the recruitment of inflammatory macrophages 
during wound healing and a role for such cells in tissue 
repair (Willenborg et al., 2012). Since Ml macrophages ex- 
press CCR4 (Figures 5D and S6A), the receptor for CCL2, 
we think that these macrophages are recruited to plucked 
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follicles by plucking-induced CCL2. Consistent with this view, 
plucking failed to induce the accumulation of M1 macro- 
phages in CCL2 null mouse skin (Figure 5H). These data 
support the model that CCL2 expressed by plucked follicles 
recruits CCR4-expressing M1 macrophages, which play an 
essential role in regeneration. We next explored how this 
comes about. 

Hair Regeneration Induced by Quorum Sensing Is Tnf-a 
Dependent 

Macrophages are known to produce Tnf-a. Tnf-a mRNA was 
induced ~2 days after plucking, before anagen initiates (Figures 
3B, 3C, and S3). Low power whole mount in situ hybridization 
reveals that, under conditions in which regeneration occurs, 
Tnf-a is enriched in the extra-follicular environment of both 
plucked and unplucked hairs (Figure 6A). Double immunostain- 
ing showed that these cells are indeed M1 macrophages (Fig- 
ures 5B, 5C, 5D, 6B, and S6). 

Semiquantitative analysis of immunostained specimens ob- 
tained from regions of different plucking densities support the 
view that Tnf-a expressing macrophages accumulate around 
HFs that regenerate after plucking (both plucked and un- 
plucked), but do not accumulate under plucking conditions 
that fail to activate hair regeneration (Figure S6D). 

Quantitative measurements of macrophage-derived Tnf-a 
immunoreactivity over time in a threshold-plucking density re- 
gion (200 hairs/5 mm diameter) showed that Tnf-a positive cells 
are induced around plucked follicles. They then increased signif- 
icantly at day 3 and 5 after plucking, spreading into the dermal 
region between plucked follicles. They decreased at day 7 to 
approach basal levels (Figures 5E, 6C, and S7B). 

To investigate the functional importance of Tnf-a in plucking- 
induced hair regeneration, beads coated with Tnf-a-related 
peptide were injected into refractory telogen stage mouse skin. 
Flair regeneration was induced, followed by propagation to the 
surrounding region (Figure 6E). Control bead injection did 
not induce hair regeneration even after 30 days (Figure 6G). 
Conversely, when hairs were plucked at high density in Tnf-a 
null mice (Figure 6F), a 15-day delay in regeneration was 
observed. These results indicate that Tnf-a is one of the major 
players for plucking-induced hair regeneration. 

Last, we searched for molecules and signals that might func- 
tion further downstream in hair regeneration. For example, Tnf-a 
is known to stimulate both JNK and NF-kB (nuclear factor 



kappa-light-chain-enhancer of activated B cell) signaling. It is 
also known that activation of the FGF signaling pathway can 
trigger hair regeneration (Greco et al., 2009). We therefore 
screened inhibitors of NF-kB, JNK, PI3K, FGF receptor, p38 
MARK, and Erk for effects on plucking-induced hair regenera- 
tion. Only NF-kB inhibitors delayed hair regeneration, doing so 
by 10 days (Figures 6H and S7C). In addition, Tnf-a-related pep- 
tide significantly stimulates the expression of Wnt3, WntlOa, 
and WntlOb in keratinocytes (Figure 61). Although the Eda-NF- 
icB pathway is important in hair development, a previous study 
indicated that Eda participates in anagen to catagen transition 
during the postnatal hair regeneration cycle (Fessing et al., 
2006). Hence, it is not likely that Eda is involved in the plucking 
induced hair regeneration response. Together the results raise 
the possibility that Tnf-a, acting through the NF-kB pathway, ul- 
timately stimulates hair regeneration through activation of Wnt 
signaling. 

DISCUSSION 

Social Behaviors in an Organ Population 

Many organs are composed of repeated, semi-autonomous 
tissue units, such as acini, crypts, and follicles. The potential 
for dynamic coupling between the behaviors of such units 
creates opportunities for collective phenomena. A dramatic 
example of this is the “hair wave,” a coordinated hair cycle 
wave that can travel across the skin of mammals (Suzuki et al., 
2003; Plikus et al., 2008, 2011; Murray et al., 2012). 

In this work, we characterize another collective behavior of 
HFs; density- and topology-dependent, plucking-induced re- 
generation, which can be viewed as a form of quorum sensing. 
Quorum sensing is a process whereby a population makes a col- 
lective decision based on the number or density of individuals 
that meet a certain criterion. Typically, a response occurs only 
when a threshold is exceeded. Quorum sensing has been 
invoked to describe bacterial cell-to-cell communication (Bass- 
ler, 2002) that serves to influence gene regulation in response to 
population density fluctuations (Miller and Bassler, 2001). Syn- 
thetic quorum sensing circuits in yeast were used to demon- 
strate the diversity of social behaviors that can come from collec- 
tive communication (Youk and Urn, 2014). Quorum sensing also 
has been used to explain the collective decision-making 
behavior of social insects such as ants and honey bees (Pratt, 
2005; Visscher, 2007). 



Figure 5. CCL2 Stimulates Tnf-a Production by Attracting CCR4 (+) M1 Macrophages 

(A) Tnf-a is upregulated in the dermal macro-environment on day 2 after wax stripping. Tnf-a in the dermal macro-environment is produced by both dermal 
macrophages (F4/80+ cells, yellow arrow) and adipose cells (red arrow; see also Figure S5C). Few macrophages (yellow arrow) are present at hour 4 and day 10 
after plucking. These macrophages do not express Tnf-a. 

(B and C) Staining shows that Tnf-a is mainly produced by Ml (iNOS-positive) rather than M2 (Arginase-positive) macrophages. 

(D) Tnf-a (+) cells express CCR4 in response to CCL2. 

(E) The number of F4/80+ cells is highest near plucked follicles and their density decreases with increasing distance from the plucked follicles. See Figure S7A for 
the unit area we quantified for each data point. The number of F4/80+ cells is rapidly elevated at day 1 post-plucking, reaching a maximum at day 3 and then 
diminishing at 5 and 7 days after plucking. 

(F) LysM-Cre;R26R reporter mice show that the myeloid lineage-derived cells mostly are induced in the dermis around plucked HFs. 

(G) When 200 hairs were plucked from myeloid cell-deficient mice from 5 mm region, hairs cannot be induced. 

(H) Tnf-a (+) cells was not induced in CCL2 null mice. 

(I) Tnf-a serum levels are similar between wild-type and CCL2 null mice. Data are represented as mean ± SD. 

See also Figures S6 and S7. 
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Molecular Nature of the Quorum-Sensing Circuit 

Briefly, the quorum sensing circuit we describe here provides a 
way for injured HFs to collectively assess the magnitude and 
extent of injury that the skin has sustained and make an all-or- 
none decision whether or not to regenerate. A striking feature 
of this circuit, revealed through molecular modeling, is that the 
information being shared among follicles decays with a charac- 
teristic length of ~1 mm, substantially greater than the measured 
decay lengths of diffusible signaling molecules (Teleman and 
Cohen, 2000; Muller et al., 2012; Sards et al., 2012; Weber 
et al., 2013; Shimozono et al., 2013). The explanation for this 
apparent paradox seems to reside in the multi-stage nature of 
the quorum signal, which begins with diffusible molecules, but 
eventually involves the recruitment of motile cells (inflammatory 
macrophages) that spread within the tissue. Below we summa- 
rize the sequence of molecular and cellular events revealed by 
the present study (Figure 7). 

(1) Micro-injury and inflammation. Flair plucking leads to hair 
keratinocyte apoptosis (Ito et al., 2002) (Figure 3A). This in 
turn leads to inflammatory changes and to the localized 
overexpression of several inflammatory cytokines, espe- 
cially CCL2, which may be detected within 12 hr post- 
plucking (Figures 3B and 4). 

(2) Molecular signal release and dissemination. CCL2 and 
other cytokines are secreted from plucked follicles and 
may also involve epidermis around the plucked follicle. 
The importance of CCL2 is demonstrated by the fact 
that, in CCL2 null skin, regeneration is markedly delayed 
(Figure 4). The fact that it is not prevented entirely sug- 
gests that some other cytokines induced by plucking 
may act redundantly with CCL2. 

(3) Recruitment of macrophages as motile vectors. The local 
production of CCL2 appears to recruit CCR4 (+) Ml mac- 
rophages in the dermis (Figure 5). Whereas macrophages 
initially appear to be enriched around plucked follicles, re- 
cruited macrophages soon spread throughout the whole 
region. By relaying a signaling response with a motile 
cellular vector, HFs effectively solve the problem of 
spreading quorum information over long distances. 

Another motile vector candidate is the epidermal dendritic 
Langerhans cell since it also expresses F4/80 antigen. How- 
ever, F4/80 positive cells appear in dermis at day 1, but do 
not appear in the epidermis until days 5-7. Our microarray 
data also did not reveal upregulation of Langerhans cell 
markers, such as CD207 (Langerin) and GDI 1b. Although we 
do not completely rule out the involvement of Langerhans in 
this process, our data so far suggest a major role for dermal 
macrophages in this process. 

It is worthwhile to mention here that more examples of 
extended cellular process that mediate signal communication 
are being identified. For example, in zebrafish stripe pattern 
formation, pigment cells can utilize their contact-dependent 
depolarization and repulsive behavior as non-diffusible inhibitors 
that follow Turing principles (Inaba et al., 2012). In Drosophila 
epithelia, cytonemes can establish a dynamic hedgehog mor- 
phogen gradient that may reach afar (Bischoff et al., 2013). 



Future in vivo imaging studies of the mouse skin model studied 
here will allow us to elucidate the interactive cellular behaviors 
between HFs, immune system and regeneration. 

(4) Release of Tnf-a and collective regeneration. Inflamma- 
tory macrophages that are recruited to wound fields 
secrete Tnf-a, which has been shown to activate hair cy- 
cle regeneration (Figure 6E) (Duheron et al., 2011). Regen- 
eration is greatly impaired in Tnf-a null mice (Figure 6F); 
moreover, Tnf-a serum levels are normal in CCL2 null 
mice that show impaired plucking-induced regeneration 
(Figure 51). These studies indicate that local, not systemic 
Tnf-a is required for regeneration. Although the exact 
mechanism by which Tnf-a triggers follicle regeneration 
is not clear, the data suggest that Tnf-a may act through 
NF-kB which in turn activate canonical WNT signaling 
(Figure 6D) (Cawthorn et al., 2007; Schwitalla et al., 
2013). While Tnf-a immunoreactivity is mainly in macro- 
phages, it is also detected in other cell types and may pro- 
vide additional possible mechanisms. The EDAR pathway 
may also activate NF-kB. However, EDAR it is more 
involved in anagen/catagen transition (Fessing et al., 
2006), not telogen/anagen transition. Further, it is not 
induced in our microarray data (not shown). 

Adaptive Role of Quorum Sensing 

Depending on the severity of skin injury, the body may use 
different mechanisms to alert, defend, and regenerate the 
damaged tissue. Plucking of a single hair follicle is a micro-injury. 
An open, full-thickness wound is a catastrophic event (i.e., 
macro-injury). While wounded skin is known to induce hair 
regeneration, small and large wounds may share a fundamental 
mechanism, but use different molecular circuits to achieve 
different levels of restoration and regeneration. Indeed, HF activ- 
ities have been linked to the wound-healing and regenerative 
behaviors of the inter-follicular epidermis. This link may be medi- 
ated by the immune system (Paus et al., 1998), macrophage 
recruitment (Osaka et al., 2007), and increased Tnf-a expression 
(Jiang et al., 2010). Interestingly, TNF-a converting enzyme, a 
regulator of Tnf-a, is a component of the HF bulge niche (Nagao 
et al., 2012b). Anagen phase HFs can influence the surrounding 
epidermis to markedly accelerate wound healing (Ansell et al., 
2011). In mice, loss of full thickness skin larger than 1 cm in diam- 
eter could lead to new follicle formation (Ito et al., 2007). How- 
ever, plucking does not launch a full wound healing response, 
so conceptually plucking works differently from the wound and 
our study focuses at a different scale. It provides a novel under- 
standing into how HFs respond to injury at the level of a HF pop- 
ulation. We analyzed how interactions among HFs and the 
dermal environment reach a binary choice based on a collective 
measurement of injury. We show that effective damage control is 
achieved via co-option of existing signaling mechanisms (e.g., 
Tnf-a, macrophage) for the “social behaviors” of a stem cell 
population. 

In summary, we report a higher level integration of signals 
from hair regeneration, immune cytokines, and wound healing. 
Instead of a top-down process, quorum sensing represents a 
bottom-up process based on local information. Each follicle 
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Figure 6. Hair Regeneration Is Proportional 
to the Local Concentration of Tnf-a 

(A) Whole mount in situ hybridization shows Tnf-a 
(brown color in the dermis, red arrows) was 
induced under plucked and unplucked hairs (blue 
arrows) toward the center of the 3 mm plucked 
zone 5 days after plucking. 

(B) Semiquantitative assessment of the Tnf-a 
concentration using the 5mm group at day 5. Its 
expression level was quantified in three different 
skin regions (green boxes) that differ in their 
proximity to plucked follicles, “a” is closest to the 
plucked follicles and shows the highest Tnf-a 
levels, “b” is away from the plucked follicles and 
shows lower Tnf-a levels, “c” is furthest away and 
shows the least Tnf-a. 

(C) Quantitative assessment of the Tnf-a positive 
cells around plucked follicles and inter-plucked 
follicle dermis. See Figure S7B for complete series. 
The pattern is similar to that of F4/80 macrophage 
distribution (n = 3). 

(D) Density-dependent plucking on Axin-LacZ 
mice show that the canonical Wnt/(3-catenin 
signaling pathway was activated 3 days after 
plucking and the number of LacZ (+) HFs was 
proportional to the plucking density. 

(E) Subcutaneous injection of Tnf-a-related pep- 
tide coated beads during refractory telogen can 
induce anagen re-entry and then propagate to the 
surrounding HFs. 

(F) T nf-a null mice exhibit a 1 5-day delay in anagen 
re-entry following plucking of 200 hairs during re- 
fractory telogen phase. Intra-peritoneal macro- 
phage inhibitor (Ml) injection can also delay 
plucking induced hair regeneration by 12 days. 

(G) Albumin coated beads injection showed no 
anagen re-entry even after 32 days. 

(H) Subcutaneous NF-kB inhibitor injection can 
delay plucking-induced hair regeneration by 
10 days. 

(I) Wnt3, WntlOa, and WntlOb were activated in 
keratinocytes by TNF-related peptides. Data are 
represented as mean ± SD. **p < 0.001 . 

See also Figures S6 and S7. 
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Figure 7. Molecular Basis of Quorum- 
Sensing Behavior during the Activation of 
Hair Stem Cells in the Follicle Population 
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becomes a sensor for the population to assess the level of 
damage. The molecular circuit quantifies injury strength by 
summing together local signals from different organs. Here, 
the communication among tissues reaches a larger scale orga- 
nization by coupling local molecular signaling (in the form of a 
chemical gradient) with motile cellular vectors. In this study, 
macrophages are identified as a motile vector that allows a 
length scale of up to 1 mm. In this manner, the injury response 
is measured and reflects local needs. This study may just be 
one of the examples that reveal collective cellular behaviors 
in response to physiological or pathological stimuli. We believe 
that the quorum sensing behavior principle is likely to be 
present in the regeneration of tissue and organs beyond the 
skin. 

EXPERIMENTAL PROCEDURES 
Surgical Procedures 

All procedures were performed on anesthetized animals with protocols 
approved by the University of Southern California Institutional Animal Care 
and Use Committee (USC lACUC). Hair cycle was synchronized by wax strip- 
ping (Muller-Rover et al., 2001). Hairs in refractory telogen were plucked with 
the spacing indicated in the result section. Regenerative hair numbers are 
counted under a dissection microscope. 

RNA Preparation and Microarray 

For microarray, all the dermal tissues are collected. RNA was prepared 
using TRI Reagent BD (Sigma-Aldrich) following the manufacturer’s 
recommendations. Please see Extended Experimental Procedures for detail. 



The microarray data reported here have been submitted to the GEO 
(accession number GSE46181). Primer sequences for RT-PCR are listed 
in Table SI. 

Perturbation of Quantitative Plucking 

Small molecular inhibitor or peptides were injected intra-dermally on one side 
of mouse dorsal skin for 4 days. Then 200 hairs were plucked in the center of 
the injected area. After plucking, these drugs were continuously injected for an 
additional 6 days. DMEM was injected to the opposite side as a control. Each 
animal was injected with only one reagent. 

ACCESSION NUMBERS 

The GEO accession number for the microarray data reported in this paper is 
GSE46181. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, seven 
figures, and one table and can be found with this article online at http://dx.doi. 
org/10.1016/j.cell.2015.02.016. 
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SUMMARY 

Cholesterol is dynamically transported among organ- 
elles, which is essential for multiple cellular functions. 
However, the mechanism underlying intracellular 
cholesterol transport has remained largely unknown. 
We established an amphotericin B-based assay 
enabling a genome-wide shRNA screen for delayed 
LDL-cholesterol transport and identified 341 hits 
with particular enrichment of peroxisome genes, 
suggesting a previously unappreciated pathway for 
cholesterol transport. We show dynamic membrane 
contacts between peroxisome and lysosome, which 
are mediated by lysosomal Synaptotagmin VII bind- 
ing to the lipid PI(4,5)P2 on peroxisomal membrane. 
LDL-cholesterol enhances such contacts, and 
cholesterol is transported from lysosome to peroxi- 
some. Disruption of critical peroxisome genes leads 
to cholesterol accumulation in lysosome. Together, 
these findings reveal an unexpected role of peroxi- 
some in intracellular cholesterol transport. We further 
demonstrate massive cholesterol accumulation in 
human patient cells and mouse model of peroxisomal 
disorders, suggesting a contribution of abnormal 
cholesterol accumulation to these diseases. 



INTRODUCTION 

Cholesterol, an essential lipid for eukaryotic cells, plays impor- 
tant roles in many cellular processes including membrane prop- 
erties regulation, steroidogenesis, bile acid synthesis, and signal 
transduction. Accounting for ~30%-40% of total cellular lipids, 
cholesterol is dynamically transported in cells and unevenly 
distributed in cellular membrane structures. Only ~0.5%-1% 
of total cellular cholesterol is present in the ER membrane (Lange 
et al., 1 999) and its concentration is higher in the Golgi apparatus 
and highest (~60%-80%) in the plasma membrane (PM) (Liscum 
and Munn, 1999). In addition, cholesterol exerts diverse cellular 

CrossMark 



functions in different organelles. Sterols in ER control de novo 
cholesterol biosynthesis by inhibiting SREBP processing and 
promoting degradation of HMG-CoA reductase (Goldstein 
et al., 2006). Cholesterol is esterified in ER for storage and lipo- 
protein secretion (Chang et al., 1997; Vance and Vance, 1990) 
and oxidized and converted to steroids and bile acids in mito- 
chondria and peroxisome (Ishibashi et al., 1996). Thus, dynamic 
cholesterol transport in cells is pivotal for multiple cellular 
functions. 

Low density lipoprotein (LDL)-derived cholesterol trafficking 
is a major part of intracellular cholesterol transport with most 
mammalian cells acquiring ~80% of their cholesterol through re- 
ceptor-mediated endocytosis of plasma LDL (Brown and Gold- 
stein, 1986). Upon receptor binding and internalization, LDL is 
delivered from early endosome to late endosome/lysosome 
(L/L), where LDL-derived cholesteryl esters are hydrolyzed to un- 
esterified cholesterol. Free cholesterol then egresses from L/L 
and is further passed to downstream organelles such as the 
PM, ER, and mitochondria to fulfill its functions (Chang et al., 
2006). To date, most mechanistic knowledge on cholesterol pas- 
sage from L/L to other organelles has come from studies of the 
inheritable neuronal degeneration disorder Niemann Pick type 
C (NPC) disease, which is caused by loss-of-function mutations 
in NPC1 or NPC2 genes (Carstea et al., 1997; Sleat et al., 2004). 
NPC patients show severe cholesterol accumulation in multiple 
tissues. NPC1 is a polytopic membrane protein on L/L, whereas 
NPC2 is a luminal protein. After cholesteryl ester is hydrolyzed in 
the lysosomal lumen, NPC2 binds the unesterified cholesterol by 
recognizing the 8-carbon isooctyl side chain. NPC2 then hands 
over the cholesterol molecule to the N-terminal domain of 
NPC1, with the 3p-hydroxyl group buried within the binding 
pocket. The NPC1 -bound cholesterol projects through the gly- 
cocalyx and is inserted into the lysosomal membrane. In NPC1 
or NPC2 mutant cells, cholesterol cannot be incorporated into 
membrane and is therefore accumulated in the lumen (Kwon 
et al., 2009). However, this only accounts for how free cholesterol 
reaches the L/L membrane, and the mechanisms whereby 
cholesterol leaves the lysosomal membrane and moves to other 
organelles remain largely unknown. 

To identify critical proteins for intracellular cholesterol trans- 
port, we developed a cellular system using the antifungal 
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Figure 1. Genome-wide RNAi Screen Identifies Genes Involved in Intracellular Cholesterol Transport 

(A) Schematic representation of the screen strategy. 

(B) The ceiis were treated as shown in (A) and Figure SIC. The PM choiesteroi and effect of AmB on ceii growth at each time point were determined. 

(C) PM choiesteroi leveis and survivai ratio based on crystai vioiet staining of each seiection round. Resuits represent the mean ± SD of three independent 
experiments. 

(legend continued on next page) 
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antibiotic amphotericin B (AmB), in which cells only survive 
when they have impaired intracellular cholesterol transport. We 
performed a genome-wide pooled shRNA screen with the AmB 
system and identified over 300 genes affecting cholesterol trans- 
port. The genes encoding peroxisomal proteins were enriched. 
We further demonstrated that peroxisome forms transient lyso- 
some-peroxisome membrane contact (LPMC) with lysosome 
through the binding of peroxisomal lipid PI(4,5)P2 by lysosomal 
protein Synaptotagmin VII (Syt7). Cholesterol can be transported 
to peroxisome from lysosome through LPMC. Consistent with 
the latter findings, we observed drastic cholesterol accumulation 
in the X-chromosomal form of adrenoleukodystrophy (X-ALD) 
mouse model and in fibroblasts from human patients with 
different types of peroxisomal disorders. Our findings therefore 
reveal a fundamental role of peroxisome in intracellular choles- 
terol transport and suggest potential novel strategies for the 
diagnosis and treatment of peroxisome-related diseases. 

RESULTS 

Genome-wide Pooled shRNA Screening for Cholesterol 
Trafficking Defective Cells 

AmB binds to cholesterol in PM and forms pores that lead to 
cytoplasm leakage and cell death (Andreoli, 1973). Based on 
this property, we designed a genome-wide shRNA screen to 
identify genes required for intracellular cholesterol transport, in 
particular the transport of cholesterol from LDL receptor 
(LDLR)-mediated endocytosis. The rationale and overall process 
of the screen are depicted in Figure 1 A. There are three key ele- 
ments, namely: (1) inhibition of endogenous cholesterol biogen- 
esis throughout the entire process and delivery of cholesterol by 
LDL particles to focus on the transport of LDL-derived choles- 
terol, (2) synchronization of cells at the stage of high cholesterol 
in L/L and low cholesterol in PM so that the cholesterol can be 
transported to the PM in all cells at a given time point, and (3) 
enrichment of cholesterol trafficking defective (CTD) cells by us- 
ing AmB that kills the cells with proper cholesterol transport in a 
controlled manner. The first key element is achieved by using 
lovastatin to inhibit HMG-CoA reductase and low concentration 
of mevalonate to only permit the synthesis of nonsterol isopre- 
noids essential for cell growth. Lipoprotein-deficient serum is 
also used before LDL delivery so that the cells are in cholesterol 
starvation and the initial LDLR level is very high. The second key 
element is realized by using U18666A, a compound that revers- 
ibly blocks cholesterol efflux from L/L (Liscum and Faust, 1989), 
and cyclodextrin, a cholesterol mobilizing reagent (Liu et al., 
2010; Rosenbaum et al., 2010). Co-treatment of cholesterol- 
starved cells with LDL, lovastatin, and U18666A leads to 
LDLR-mediated endocytosis of large amounts of cholesterol 
which is trapped in L/L by U18666A. After a short exposure to 
cyclodextrin to acutely deplete cholesterol from PM, the cells 
are incubated without U18666A to allow cholesterol transport 



from L/L to PM. AmB is then used to kill the cells with more 
cholesterol in PM. The cholesterol trafficking rate and PM- 
cholesterol level are lower in CTD cells than wild-type (WT) cells 
at particular time points. Thus, these CTD cells can survive AmB 
treatment. 

The procedure described above was validated by comparing 
WT CHO-7- and A/PC 7 -deficient CT43 cells (Figure IB). Cyclo- 
dextrin decreased the PM-cholesterol level to 0.59 |ig/mg 
protein. After removal of U18666A, PM-cholesterol level was 
much higher in CHO-7 than CT43 cells and the former was 
more sensitive to AmB treatment (Figure IB). To perform the 
screen, HeLa cells were infected with a pooled shRNA library 
and the virus-infected cells were subjected to AmB selection 
as described above (Figure SI A). We observed gradual 
decrease of PM-cholesterol and increase of survival rate in the 
first five rounds of selection before reaching plateau (Figure 1 C), 
suggesting that CTD cells were largely enriched. The shRNA 
inserts were then amplified from the CTD cells and subjected 
to deep sequencing. 

The RNAi screening identified 341 candidate genes, each of 
which was targeted by two or more small hairpin RNAs (shRNAs), 
eliminating the off-target effect of shRNA. Their symbols and 
basic information are listed in Table SI . 

Analysis and Validation of Screening Results 

To characterize the enriched biological processes and pathways 
in our screen, the 341 gene hits were subjected to gene ontology 
(GO) enrichment analysis and Kyoto Encyclopedia of Genes and 
Genomes (KEGG) database analysis (Figures 1 D and 1 E). The 
genes involved in lipid metabolism and intracellular transport 
were amply presented, constituting 28.7% of total candidates 
(Figures ID and IE). Among these hits, there is NPC1 , loss of 
which is well known to trap cholesterol in lysosome and prevent 
cholesterol from traveling to PM. This serves as a positive control 
and suggests our screen was successful. Our screen also recov- 
ered genes that participate in LDLR expression regulation and 
endocytosis, such as SREBP2, SCAP (Brown and Goldstein, 
1997), LDLR (Brown and Goldstein, 1986), and AP2 associated 
kinase 1 (AAK1) (Conner and Schmid, 2002). Because silencing 
of these genes prevents cells from taking up LDL, their appear- 
ance in the candidates list was expected. 

Unexpectedly, we found marked enrichment for genes 
associated with neurological diseases, peroxisome, calcium, 
transcription/RNA processing, immune response, cell adhesion, 
Hh pathway, ubiquitin-mediated proteolysis, and purine meta- 
bolism. It is interesting that neurological disease-related genes 
are discovered in our screen to affect cholesterol transport. As 
exemplified by NPC disease, which is characterized by severe 
neurological symptoms secondary to cholesterol accumulation 
in lysosome, the neuron is particularly sensitive to cholesterol 
alteration, and impaired cholesterol transport may be a mecha- 
nism shared by these neurological diseases. 



(D) Bioinformatics ciassification of the hits into bioiogicai processes and moiecuiar functions categories. The number in the bracket shows the number of genes in 
each category. 

(E) Statisticaiiy enriched bioiogicai processes superimposed on a sketch depicting a ceii, with the corresponding p vaiue of GO anaiysis in the screen. Genes in 
red refer to representative hits. 

See aiso Figure S1 and Tabie S1 . 
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Figure 2. Peroxisome Forms Transient and Dynamic Membrane Contacts with Lysosome 

(A) Knockdown of the peroxisome genes identified in the screen ied to choiesteroi accumuiation and decrease of PM choiesteroi ieveis. The “+” indicates the 
degree of choiesteroi accumuiation; the indicates no obvious choiesteroi accumuiation. 

(B) SV589 ceiis transfected with indicated siRNAs were stained with fiiipin (red) and antibody against endogenous l_AMP1 (green) or PMP70 (green). Scaie bar, 
10 lam. LAMP1: iysosome marker, PMP70: peroxisome marker. 

(C) HeLa ceiis transfected with mouse ABCDI-mCherry were assessed by immunostaining with antibody against PMP70 (green) or LAMP1 (green). Scaie bar, 
10 lam. 

(D) Quantification of coiocaiization of ABCD1 with organeiie-specific markers shown in (C) and Figure S3A. GM130, Goigi marker; EEA1 , eariy endosome marker. 
Data represent mean ± SD (n = 4, 35 ceiis per independent experiment). 

(legend continued on next page) 
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To further confirm the hits, we selected 30 representative 
genes covering all 14 classes and validated them using distinct 
shRNA sequences. The survival rate of knockdown cells is 
dramatically higher upon AmB treatment as compared with con- 
trol cells (Figure SI D). Among the 30 representative genes, indi- 
vidual knockdown of 27 genes caused PM-cholesterol content 
to decrease by >50% (Figure S1F). Fifteen genes exhibited 
markedly enhanced cholesterol accumulation in cells as shown 
by filipin staining (Figure S1G). These results confirmed the reli- 
ability of our screen. 

Intriguingly, genes encoding peroxisomal proteins were statis- 
tically enriched (Figures 1 D and 1 E). When the peroxisomal hits, 
including ABCD1 , ACOT8, BAAT, TMEM135, PEX1 , PEX3, 
PEX6, PEX10, and PEX26 were individually knocked down, the 
PM-cholesterol level significantly decreased by 35%-84% as 
compared to control (Figure 2A). Cholesterol accumulation was 
observed in lysosome, but not peroxisome (Figures 2B and S2B). 

Peroxisome Forms Transient and Dynamic Contacts 
with Lysosome 

How can depletion of peroxisomal proteins lead to cholesterol 
accumulation in lysosome? To answer this question, we used 
ABCD1 , a peroxisomal membrane protein and also one of the 
strongest hits from our screen, as a representative to investigate 
the mechanism. 

ABCD1 mainly colocalized with the peroxisome marker 
PMP70 as expected. However, significant amount of colocaliza- 
tion between ABCD1 and lysosome marker l_AMP1 was surpris- 
ingly observed (Figures 2C, 2D, and S3A). Using GM130 as 
marker for the Golgi apparatus and EEA1 and Rab5 as early en- 
dosome markers, we found the lysosome-peroxisome contact 
was very specific as there was little detectable association be- 
tween peroxisome and these two organelles (Figures 2D and 
S3A). Is the apparent colocalization of lysosome and peroxisome 
due to the sporadic distribution of ABCD1 in lysosome? SKL is a 
strong peroxisome localization signal and the EGFP-Hise-SKL 
protein is widely used to label peroxisome. We analyzed other 
peroxisome markers such as transfected EGFP-HiSe-SKL and 
endogenous PMP70 to rule out potential interference of partic- 
ular marker or antibody and found a similar partial colocalization 
between lysosome and peroxisome (Figures S3A-S3C). 

We took extra caution to further validate this phenomenon us- 
ing 3D reconstitution, super resolution structured illumination mi- 
croscopy (SR-SIM), and electron microscopy. 3D reconstitution 
and high resolution confocal images showed that the small 
membrane interaction between lysosome and peroxisome could 
indeed be observed using different microscopic methods (Fig- 
ures 2E and 2F). Moreover, lysosome and peroxisome formed 
contacts in primary mouse hepatocytes detected by transmis- 
sion electron microscopy (Figure 2G). With these validations, 
we named this phenomenon lysosome-peroxisome membrane 



contact (LPMC). To our knowledge, the LPMC has not been 
reported before. 

Time-lapse microscopy was next employed to understand the 
LPMC dynamics in living cells. It revealed that the contact be- 
tween lysosome and peroxisome was only transient. In a time 
frame of a few dozen to 1 00 s, a particular peroxisome formed 
a contact with one lysosome, then was released and moved 
away. It could then associated with another lysosome in a similar 
time frame (Figure 2H; Movies SI and S2). Notably, we observed 
no fusion of lysosome with peroxisome (Figure 2H). Consistently, 
a lysosomal matrix protein such as NPC2 was not detected in 
peroxisome when LPMC formed (Figure S3B). 

To further validate the LPMC, we designed an organelle co- 
precipitation assay (Figure S3D). The cells stably expressing 
EGFP-HiSe-SKL were lysed without disturbing organelle integ- 
rity, and the membrane fractions were incubated with Ni Se- 
pharoses to pull down peroxisome. The isolated fractions were 
then examined by fluorescent images of Ni Sepharoses and 
western blot. As shown in Figure S3E, NPCI-mCherry-labeled 
lysosome (red) could be observed on the beads covered by 
peroxisome (green). On the other hand, mCherry-Rab5-labeled 
early endosome was not co-precipitated suggesting the LPMC 
was specific. In line with these results, western blot analysis 
showed that the lysosomal protein LAMP1 was efficiently copre- 
cipitated with peroxisome, but markers for other organelles were 
not (Figure S3F). Together, these lines of evidences strongly 
demonstrate the presence of LPMC. 

We next examined if LPMC is regulated. Knockdown of A/PC 7 
orABCDI significantly decreased the LPMC, with this effect be- 
ing evident using both cell imaging and organelle co-precipita- 
tion methods (Figures 3A-3C). Depletion of other peroxisomal 
functional proteins such as PEX1 also led to less LPMC (Figures 
S2B and S2C). More importantly, the LPMC was significantly 
reduced under cholesterol depletion status, and this reduction 
could be time-dependently reversed by cholesterol repletion 
from LDL (Figures 3D-3F). Knockdown of LDLR, Clathrin heavy 
chain (CMC), or co-depleting of adaptor proteins genes including 
AP2 subunit aipha 2, ARM, and Dab2 to inhibit LDL endocytosis 
not only attenuated lysosomal cholesterol replenishment, but 
also decreased the LPMC (Figures 3G and 3H). These results 
suggest that cellular cholesterol content regulates LPMC, which 
also requires proper functions of lysosome and peroxisome. 

Synaptotagmin VII Is a Lysosomal Protein Binding 
Peroxisome 

We next sought to identify the molecules bridging LPMC. A 
multi-arm proteomics approach was employed to analyze lyso- 
somal membrane proteins, peroxisomal proteins, and NPC1 
interacting proteins (Figures S4A and S4B; Table S2). After merg- 
ing of the protein lists, candidates involved in vesicle fusion or 
organelle dynamics were selected as refined candidates for 



(E) A representative SR-SIM image of the overlaid endogenous l_AMP1 (green) and PMP70 (red) images. Arrowheads indicate LPMC sites. Scale bar, 10 |am. 

(F) HeLa cells were immunostained with antibodies against LAMP1 and PMP70 and analyzed by Volocity-3D software. Arrowheads indicate LPMC sites. Scale 
bar, 10 i^m. 

(G) Transmission electron micrograph of the LPMC in a mouse liver cell. L, lysosome, P, peroxisome. Scale bar, 500 nm. 

(H) SV589 cells were transfected with EGFP-SKL and NPCI-mCherry. Time-lapse images were acquired. Scale bar, 500 nm. See also Movie S2. 

See also Figures S2 and S3 and Movies SI and S2. 
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Figure 3. Regulation of Lysosome-Peroxisome Membrane Contacts 

(A) SV589 cells were transfected with indicated siRNAs and immunostained with antibodies against l_AMP1 (green) and PMP70 (red). Scale bar, 10 |am. 

(B) Quantification of LPMC in (A). Data represent mean ± SD. **p < 0.01 , one-way ANOVA (n = 4, 35 cells per independent experiment). 

(C) Lysosome-peroxisome association revealed by organelle co-precipitation assay. 

(legend continued on next page) 
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RNAi validation (Table S2). Out of the 16 candidates, Synapto- 
tagmin VII (Syt7) was the only that passed the validation: its 
knockdown but not that of others caused clear cholesterol accu- 
mulation in cells (Figure S4C). 

Synaptotagmin is a family of proteins involving in vesicle 
interaction and fusion. Syt7 is widely expressed and plays 
important role in lysosomal exocytosis, membrane resealing, 
and wound healing (Andrews and Chakrabarti, 2005). Syt7 
mainly colocalized with the lysosome marker l_AMP1 (Fig- 
ure 4A). Similarly to NPC1 and LAMP1, Syt7 significantly colo- 
calized with the peroxisome marker PMP70 but not markers 
for Golgi or early endosome (Figures 4A and 4B). Knockdown 
of Syt7 resulted in cholesterol accumulation in lysosome 
(Figure 4C), and the LPMC was also dramatically diminished 
(Figure 4D). Syt7 is a transmembrane protein with a short N-ter- 
minal ectodomain, a single transmembrane segment, and a 
large cytosolic region containing two tandem Ca^'^-binding C2 
domains (C2A and C2B, Figure 4E). The C2A and C2B domains 
are responsible for the Ca^'^-dependent interactions between 
Syt7 and SNAREs or phospholipids. When overexpressed, 
these domains compete for binding to SNAREs or phospho- 
lipids and function as dominant-negative forms (Desai et al., 
2000). We utilized a similar method and found that overexpres- 
sion of C2A or C2B domain dramatically inhibited LPMC in cell 
imaging and organelle coimmunoprecipitation (colP) (Figures 
4F, 4G, and S4D), accompanied by cholesterol accumulation 
in cells (Figure 4H). 

We further developed an in vitro reconstitution assay to 
dissect the mechanism of LPMC (Figure S5A). Briefly, EGFP- 
Hise-SKL-Iabeled peroxisome and NPCI-FLAG-mCherry- 
labeled lysosome were separately isolated by density gradient 
centrifugation. The peroxisomes were further precipitated by 
Ni Sepharoses and incubated with purified lysosome fractions. 
After incubation, Ni Sepharoses were separated by centrifuga- 
tion and subjected to confocal microscopy and western blot. 
As shown in Figure S5B, lysosome labeled as red was pulled 
down with peroxisome in the presence of cytosol and ATP/ 
GTP. Consistently, the lysosome marker NPCI-FLAG-mCherry 
was co-precipitated at this condition (Figure 41). These results 
suggest energy and some cytosolic proteins may facilitate 
LPMC. Addition of dominant-negative Syt7-C2AB protein (Fig- 
ure S5C) in the incubation step blocked the lysosome peroxi- 
some interaction (Figures 4J and S5D). Similarly, when lyso- 
somes from Syt7 or NPC1 RNAi-depleted cells were incubated 
with control peroxisome, the LPMC was significantly reduced. 
Conversely, when lysosome from control cells was incubated 
with peroxisome from Syt7 or NPC1 RNAi-depleted cells, the 
LPMC was not affected (Figures 4K and S5E). These findings 
indicate Syt7 is a lysosomal protein required for LPMC 
formation. 



PI(4,5)P2 in Peroxisome Membrane Bridges LPMC 

It has been documented that SNAREs mediate membrane con- 
tacts and fusion throughout the secretory pathway (Chen and 
Scheller, 2001; Weber et al., 1998). Organelles such as Golgi, 
ER, and lysosome are all maintained by SNARE-based fusion 
events. However, so far, no peroxisomal SNARE protein has 
been identified. Consistent with previous studies (Matsumoto 
et al., 2003), no SNARE family protein was identified in our perox- 
isomal proteomics (Table S2). Because Syt7 binds to phospho- 
lipids besides SNARE, we hypothesized that Syt7-mediated 
LPMC might be through its interaction with peroxisomal phos- 
pholipids. To test this hypothesis, we examined the binding 
specificity of Syt7-C2AB to various phospholipids in a PIP-strip 
assay. Syt7-C2AB mainly bound PI(4,5)P2 and to a much lesser 
extent PI(5)P and PS; no signal was observed for other phospho- 
lipids (Figure 5A). It has been reported that peroxisome can 
synthesize significant amounts of PIP2 including PI(4,5)P2 (Jey- 
nov et al., 2006). To further validate Syt7-PI(4,5)P2 interaction 
under a more relevant format, we performed the liposome flota- 
tion assay using liposomes mimicking phospholipid composition 
of the mammalian peroxisome membrane (PC:PE:PI:PS = 
54:36:5:5) (Hardeman et al., 1990). As shown in Figure 5B, 
when mixed with blank liposomes or PI5P containing liposomes, 
the Hise-C2AB protein was predominantly detected in the 
bottom fraction. Trace amount of His6-C2AB in middle and top 
fractions was also detected, possible due to the weak binding 
of C2AB to PS and PI5P. In contrast, the majority of Hise-C2AB 
protein was co-floated with liposomes containing PI(4,5)P2 to 
the top fraction. These results demonstrated that the C2AB 
domain of Syt7 interacts with PI(4,5)P2 in membrane. 

Next, we sought to determine whether the Syt7-PI(4,5)P2 inter- 
action functions to bridge LPMC using an inducible FKBP12- 
FRB heterodimerization system to deplete PI(4,5)P2 on peroxi- 
some (Figure 5C). In the constructed SV589 cells, FKBP12 was 
targeted to peroxisome by fusion with PEX-mCherry, and the 
inositol polyphosphate 5-phosphatase synaptojanin 2 (SYNJ2) 
was kept in cytoplasm fused with mCitrine-FRB. Application of 
the chemical inducer rapamycin led to peroxisome membrane 
recruitment of mCitrine-FRB-SYNJ2 by binding PEX-mCherry- 
FKBP12 (Kapitein et al., 2010), which rapidly and irreversibly 
converted PI(4,5)P2 to PI(4)P (Figures 5C and 5D). As shown in 
Figures 5E and 5F, rapamycin treatment caused a significant 
decrease of LPMC and cellular cholesterol accumulation. The 
cell expressing only mCitrine-FRB was a control showing no 
change of LPMC or cholesterol aggregation. Although cellular 
PI(4,5)P2 also presents on PM, depletion of PI(4,5)P2 in PM by 
a similar strategy did not decrease LPMC or cause cholesterol 
accumulation (Figures S6A-S6C). Furthermore, anti-PI(4,5)P2 
antibody specifically reduced the association between lysosome 
and peroxisome in vitro (Figures 5G, 5H, and S5F). Together, 



(D) HeLa cells were incubated in cholesterol-depleting medium for 16 hr and then refed with LDL for different time durations. Cells were stained with filipin (gray) 
and antibodies against LAMP1 (red) and PMP70 (green). Scale bar, 2 ^im. 

(E) Quantification of LPMC in (D). Data represent mean ± SD (n = 4, 35 cells per independent experiment). **p < 0.01 , *p < 0.05. 

(F) Organelle co-precipitation assay was performed to validate LPMC when cells were grown under conditions shown in (D). 

(G) SV589 cells transfected with indicated siRNAs were stained with filipin (gray) and antibodies against LAMP1 (red) and PMP70 (green). Scale bar, 2 lam. 

(H) Quantification of LPMC in (G). Data represent mean ± SD (n = 4, 35 cells per independent experiment). **p < 0.01 . 

See also Figure S3. 
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Figure 4. Synaptotagmin VII Is a Lysosomal Protein Bridging LPMC 

(A) SV589 cells transfected with Syt7-mCherry were assessed by immunostaining with indicated antibodies. Scale bar, 2 |am. 

(B) Quantification of Syt7 colocalization with organelle-specific markers. Data represent mean ± SD (n = 4, 35 cells per independent experiment). 

(legend continued on next page) 
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these data demonstrate that PI(4,5)P2 in peroxisome membrane 
is required for LPMC and proper cholesterol transport. 

Because PI(4,5)P2 is critical for LPMC, we reasoned that the 
peroxisome genes from our screen might affect peroxisomal 
PI(4,5)P2 level either directly or indirectly. Indeed, a pronounced 
decrease in the amount of PI(4,5)P2 in peroxisomal lipid extrac- 
tion was detected using dot blots with anti-PI(4,5)P2 antibody 
after knocking down ABCD1 or other peroxisomal hits (Fig- 
ure S6D). These data suggest that the nine peroxisome proteins 
may not directly bind Syt7 but rather influence peroxisomal 
PI(4,5)P2 level thereby affecting lysosome association. 

Cholesterol Transport through LPMC 

To monitor cholesterol transport directly, we used ^H- 
cholesterol in the in vitro reconstitution assay (Figure 6A). Briefly, 
^H-cholesterol-labeled lysosome was isolated by density centri- 
fugation from HEK293T cell pre-incubated with ^H-cholesterol. 
Peroxisome was purified from unlabeled cells. The lysosome 
and peroxisome were then applied to the in vitro reconstitution 
system. After incubation, EGTA washing was performed to 
dissociate lysosome from peroxisome while leaving the peroxi- 
some on Ni Sepharoses. The ^H-cholesterol in peroxisome 
was then measured. To control the specificity, antibodies 
against PI(4,5)P2 or unrelated IgG were applied. The ^Fl-choles- 
terol in peroxisome increased in a time-dependent manner and 
this increase was blocked by anti-PI(4,5)P2 antibody (Figures 
6B and S7A). In addition, lysosomes prepared from NPC1 or 
Syt7 RNAi cells failed to support cholesterol transfer to peroxi- 
some (Figure 6C), because LPMC did not form when NPC1 or 
Syt7 was depleted from lysosome (Figures 4K and S5E). These 
data demonstrate that cholesterol can transfer from lysosome 
to peroxisome depending on LPMC in vitro. 

What about in cells? We performed confocal microscopy on 
HeLa cells refed with LDL and observed a time-dependent in- 
crease of co-localization between peroxisome and cholesterol 
(Figures S7B and S7C). We also directly measured the choles- 
terol level in isolated lysosome and peroxisome after incubation 
with ^H-cholesteryl oleate containing LDL (scheme in Fig- 
ure S7D). The lysosome and peroxisome were both labeled by 
^H-cholesterol although peroxisome label was less (Figure 6D). 
Results from western blot of organelle markers excluded the 
contamination with other organelles (Figure S7E). Furthermore, 



knockdown of A/PC 7 or ABCD1 caused significant increase of 
^H-cholesterol in lysosome and decrease in peroxisome (Fig- 
ure 6D). LDL pulse chase experiment (Figure S7F) followed by 
SR-SIM microscopy showed that there was overlay of peroxi- 
some with cholesterol-loaded lysosome, or cholesterol (Fig- 
ure S7G). These data suggest cholesterol flows from lysosome 
to peroxisome in cells. 

To further investigate whether LPMC is required for LDL- 
cholesterol transport to the ER, we performed SREBP cleavage 
and cholesterol esterification assays because it is well estab- 
lished that cholesterol derived from LDL prevents SREBP pro- 
cessing and stimulates cholesterol esterification once it reaches 
the ER. The results showed that LDL-cholesterol could efficiently 
block SREBP processing (Figure 6E) and stimulate cholesterol 
esterification (Figure 6F) in control cells. However, these effects 
were markedly blunted in NPC1 , ABCD1, or Syt7 RNAi cells (Fig- 
ures 6E and 6F); demonstrating that cholesterol transport to ER 
was largely impaired when LPMC was disrupted. 

With the current data and information from previous reports 
(Kwon et al., 2009), we propose the below model for cholesterol 
transport from lysosome to peroxisome. After internalization, 
LDL particles are delivered to lysosome where LDL-containing 
cholesteryl ester is hydrolyzed to unesterified cholesterol. The 
luminal NPC2 protein binds free cholesterol with the 8-carbon 
isooctyl side chain buried within the binding pocket and hands 
over the cholesterol molecule to the N-terminal domain of 
NPC1 . The NPC1 -N-terminal domain can penetrate the glycoca- 
lyx and facilitate cholesterol to insert into the lysosomal 
membrane. Lysosome and peroxisome form close membrane 
contacts through interaction between Syt7 and PI(4,5)P2. Thus, 
cholesterol can move from lysosome to peroxisome (Figure 6G). 

Intracellular Cholesterol Accumulation in Peroxisomal 
Disorders 

ABCD1 mutation causes X-ALD, which is a neurological disease 
with progressive CNS demyelination and adrenal insufficiency 
(Forss-Petter et al., 1997). X-ALD is one of the prevalent peroxi- 
somal disorders and there is no effective treatment (Moser et al., 
2005). Our work has demonstrated cholesterol transports from 
lysosome to peroxisome through LPMC, and ABCD1 depletion 
impairs LPMC and leads to cholesterol accumulation. However, 
there is no previous report on cholesterol transport defect in 



(C) SV589 cells transfected with indicated siRNAs were stained with filipin (gray) and antibodies against l_AMP1 (green) and PMP70 (red). Insets show high 
magnification of the areas framed by a white box. Scale bar, 10 i^m. 

(D) Quantification of LPMC in (C). Data represent mean ± SD (n = 4, 35 cells per independent experiment). **p < 0.01 . 

(E) Domain structure of the Syt7 protein. 

(F) SV589 cells transfected with mCherry, Syt7, C2A, or C2B of Syt7 were assessed by immunostaining with antibodies against LAMP1 and PMP70. Shown is the 
quantification of LPMC. Data represent mean ± SD (n = 4, 30 cells per independent experiment). NS, not significant, **p < 0.01 . The fluorescence images of cells 
are shown in Figure S4D. 

(G) HeLa/EGFP-Hise-SKL cells were transfected with the indicated plasmids and the lysosome-peroxisome association was analyzed by organelle co-precip- 
itation assay. 

(H) SV589 cells transfected with the indicated plasmids were stained with filipin (gray). Arrowheads indicate the cells expressing Syt7 or Syt7 variants (magenta). 
Scale bar, 10 i^m. 

(I) In vitro reconstitution of LPMC. The images of Ni Sepharoses are shown in Figure S5B. 

(J) Recombinant GST or Syt7-C2AB protein was applied in the in vitro reconstitution system. The images of Ni Sepharoses are shown in Figure S5D. 

(K) Lysosome or peroxisome was purified from cells transfected with indicated siRNAs and then used for the in vitro reconstitution assay. The images of Ni 
Sepharoses are shown in Figure S5E. Ctr, control. 

See also Figures S4 and S5 and Table S2. 
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Figure 5. PI(4,5)P2 of Peroxisome Is Required for LPMC 

(A) Protein-lipid overlay. A scheme of the PIP-strip membrane is shown (left). Arrowheads indicate specific lipids binding. Red lines highlight the phospholipid 
species. 



(legend continued on next page) 
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X-ALD or any other peroxisomal disorders. Therefore, we sought 
to validate our findings in vivo by examining if there is cholesterol 
accumulation in ABCD1 knockout (KO) animal models and 
fibroblasts of human patients with different types of peroxisomal 
disorders. 

As shown in Figure 7A, cholesterol accumulated in zebrafish 
embryo cells injected with morpholine antisense oligomer (MO) 
against A/PC7 orABCDI. Furthermore, cholesterol accumulation 
was observed in fibroblasts, cerebellum, and adrenal gland of 
ABCD1 KO mice (Figures 7B and 70), a well-accepted animal 
model capturing the pathological characteristics of X-ALD. Inter- 
estingly, in the adrenal gland cholesterol deposits were located 
almost exclusively in the cortex but not in the medulla (Figure 70), 
correlating with ABCDVs specific expression in the cortex 
(Troffer-Charlier et al., 1998). 

Because it is known that the ABCD1 KO mice do not show an 
abnormal behavioral or neurological phenotype up to 1 5 months, 
we analyzed the behavioral deficits associated with CNS demy- 
elination using rotarod test at the age of 7 and 20 months, respec- 
tively. When compared with WT littermates, the 20-month-old 
ABCD1 KO mice displayed a marked impairment (19%) in their 
ability to stay on top of a rotated cylinder during 2 days trial, while 
the 7-month-old ABCD1 KO mice were not affected (Figure 7D). 
Open field mobility paradigm was also used to study sponta- 
neous locomotion and exploratory behavior. As shown in Figures 
7E and 7F, the 20-month-old ABCD1 KO mice exhibited signifi- 
cantly fewer numbers of rearings and traveled shorter distances 
in comparison with WT mice or 7-month-old ABCD1 KO mice. 
This is important because the cholesterol accumulation occurs 
as early as 7 months (Figure 70), long before the manifestation 
of the neurological phenotypes (20-month-old), suggesting not 
only that losing of ABCD1 leads to cholesterol trafficking defects, 
but also that intracellular cholesterol accumulation might be a 
mechanism causing X-ALD symptoms. 

To further evaluate the role of peroxisome in cholesterol 
trafficking, cultured fibroblasts from patients with X-ALD, or 
two peroxisome biogenesis disorders Infantile Refsum disease 
(IRD) and Zellweger syndrome (ZS) were used for cholesterol 
staining. As shown in Figure 7G, drastic cholesterol accumula- 
tion was observed in these fibroblasts, suggesting peroxisome 
plays an essential role in intracellular cholesterol transport. 

DISCUSSION 

Using an elegantly designed cellular system, our genome-wide 
shRNA screen allows a comprehensive dissection of the genes 



and pathways that may regulate intracellular cholesterol trans- 
port. Besides the previously known cholesterol transport gene 
like NPC1 , we uncovered over 300 additional genes, among 
which the genes encoding peroxisomal proteins were highly 
enriched. 

We showed that peroxisome played an essential role in intra- 
cellular cholesterol transport through forming membrane con- 
tacts with lysosome. We provided multiple lines of evidence to 
solidify this observation. First, LPMC was observed by confocal 
microscopy and decreased by cholesterol depletion and knock- 
ing down of NPC1 or ABCD1 . Second, super resolution micro- 
scopy showed the overlapping signals between peroxisome 
and lysosome (Figure 2E). Third, 3D reconstitution verified 
LPMC from different angles (Figure 2F). Fourth, transmission 
electron micrographs directly observed LPMC in primary mouse 
hepatocytes (Figure 2G). Fifth, time-lapse imaging showed the 
LPMC is dynamic in living cells (Figure 2H). Sixth, organelle co- 
precipitation assay detected the physical interaction between 
peroxisomes and lysosomes (Figures 3C and S3F). Seventh, 
in vitro reconstitution assay confirmed that lysosome and perox- 
isome can form contacts specifically (Figures 4I and S5). 

As for the molecules bridging LPMC, our data demonstrate 
lysosomal protein Syt7 binds peroxisomal lipid PI(4,5)P2 to 
form a transient contact. How are Syt7 activation and peroxi- 
somal PI(4,5)P2 level regulated? It is well known that calcium 
can bind Syt7 leading to a conformational change (Fukuda and 
Mikoshiba, 2001). Meanwhile, the level of PI(4,5)P2 can be 
modulated by phosphatidylinositol kinases and phosphatases. 
Its distribution is also under dynamic regulation. Therefore, 
how different proteins/pathways regulate Syt7 and PI(4,5)P2 
and then influence LPMC and cholesterol transport is a particular 
interesting subject for further exploration. The in vitro reconstitu- 
tion assay developed in this study would be a powerful tool. Our 
screen discovered 9 peroxisomal proteins including ABCD1, 
knockdown of which individually leads to lowered peroxisomal 
PI(4,5)P2 level (Figure S6D). These nine peroxisomal proteins 
cover different functions and are all required for proper peroxi- 
somal function. Therefore, the dysfunction of peroxisome may 
underlie the decrease of PI(4,5)P2 and LPMC. Further studies 
are still needed to understand how these peroxisome proteins 
are functionally connected to PI(4,5)P2 regulation. 

Previous studies have indicated that cholesterol can leave lyso- 
some by vesicular or non-vesicular transport. Urano et al. (2008) 
showed that LDL-cholesterol can be transported from L/L to the 
trans-Golgi network through vesicular trafficking. Du et al. 
(2011) reported that ORP5, an oxysterol-binding protein-related 



(B) B': workflow of the liposome flotation assay. B": the presence of recombinant proteins in the top (T), middle (M), and bottom (B) fractions were detected by 
western blot using anti-Hise antibody. B'": semiquantitative densitometric analysis of western blot in B". The amount of liposomes-associated proteins was 
determined by comparing proteins present in the top fraction to the total amount of proteins present in the top, middle, and bottom fractions. 

(C) Schematic representation of the rapamycin-inducible heterodimerization system used to recruit SYNJ2 to the peroxisome membrane. 

(D) Validation of the rapamycin-inducible system in SV589 cells. Scale bar, 10 |im. 

(E) SV589 cells were transfected with PEX-mCherry-FKBPI 2 together with either mCitrine-FRB or mCitrine-FRB-SYNJ2. Cells were then treated with rapamycin, 
stained with filipin (gray), and immunostained with antibody against LAMP1, followed by Cy5-conjugated anti-mouse secondary antibody (pseudocolor, red). 
Scale bar, 2 |am. 

(F) Quantification of LPMC in (E). Data represent mean ± SD (n = 4, 30 cells per independent experiment). NS, not significant, **p < 0.01 . 

(G) Anti-PI(4,5)P2 or control IgG was applied in the in vitro reconstitution system. The images of Ni Sepharoses are shown in Figure S5F. 

(H) Semiquantitative densitometric analyses of (G). 

See also Figures S5 and S6. 
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protein, may mediate cholesterol efflux from lysosome to ER 
through binding cholesterol and NPC1 . Here, cholesterol trans- 
port across LPMC is another mechanism for cholesterol efflux 
from lysosome. Disruption of LPMC by different means causes 
significant lysosomal cholesterol accumulation. X-ALD animal 
models and fibroblasts of human patients with different types of 
peroxisomal disorders displayed drastic cholesterol accumula- 
tion (Figure 7), suggesting LPMC is a major route for cholesterol 
to leave the lysosomal membrane. Our in vitro reconstitution 
assay suggests cytosol may facilitate cholesterol movement 
from lysosome to peroxisome (Figure 4I). Finally, it is possible 
that cytosolic cholesterol binding proteins such as StarD4 and 
ORPs may accelerate the cholesterol movement from lysosomal 
membrane to peroxisome when LPMC forms. 

After reaching peroxisome, the cholesterol might be further 
oxidized or participate in bile acid synthesis in peroxisome. 
Cholesterol is also required for peroxisome lipid raft assembly 
and peroxisome biogenesis (van der Zand and Tabak, 2013; 
Woudenberg et al., 2010), and we estimated peroxisome con- 
tains ~5% of total cellular cholesterol (data not shown). Because 
disrupting LPMC decreases PM cholesterol level (Figure 2A) and 
impairs LDL-cholesterol reaching the ER (Figures 6E and 6F), it is 
likely that peroxisome may associate with other organelles and 
deliver cholesterol to them. This notion is further supported 
by the observation that cholesterol in lysosome increased by 
20%-40% whereas cholesterol in peroxisome only decreased 
by ~2% after LPMC disruption in cells (Figure 6D). Alternatively, 
cholesterol transport via LPMC may be tightly coupled with 
cholesterol modification including oxidation and esterification. 
It is interesting to further study how cholesterol transportation 
is affected by cholesterol modification and vice versa. Besides 
cholesterol transfer, LPMC may regulate other functions of lyso- 
some and peroxisome, such as autophagy, mTOR signaling, and 
peroxisome biogenesis. 

Dramatic cholesterol accumulation was observed in X-ALD 
animal models and human patients’ fibroblasts with mutations 
in different peroxisomal genes (Figure 7). Notably, the choles- 
terol accumulation (7-month-old) occurs long before the 
manifestation of the neurological phenotypes (20-month-old), 
suggesting intracellular cholesterol accumulation might be a 
potential mechanism causing X-ALD symptoms. It was also 
noted that although there was early onset of very long chain fatty 



acid accumulation, relief of its accumulation did not significantly 
improve the disease symptoms (Prieto Tenreiro et al., 2013). On 
the other hand, it was well established that the accumulation of 
cholesterol in NPC disease patients is the cause of neuron death 
and neurological phenotypes. Mobilizing cholesterol by cyclo- 
dextrin constitutes a beneficial treatment for NPC patients (Liu 
et al., 2010). Therefore, the cholesterol trafficking blockage 
may underlie the pathological mechanism of peroxisome disor- 
ders, which could provide novel strategies for diagnosis and 
treatment of these diseases. 

In summary, through functional genome-wide RNAi screen 
and hits analysis, we demonstrate the existence of lysosome- 
peroxisome membrane contacts mediated by Syt7- PI(4,5)P2 
binding, through which cholesterol is transported from lysosome 
to peroxisome. Peroxisomal disorders display significant intra- 
cellular cholesterol accumulation prior to neuronal symptoms. 
Together, this study suggests a central role of peroxisome in 
intracellular cholesterol trafficking and highlights the clinical rele- 
vance of cholesterol transport in peroxisomal disorders. 

EXPERIMENTAL PROCEDURES 

Materials and plasmids, cell culture, growth assay, liposome flotation assay, 
and other procedures are described in the Extended Experimental 
Procedures. 

shRNA Screen and Analysis 

HeLa cells were infected with the MISSION LentiPlex human pooled shRNA 
library consists of over 75,000 shRNA constructs from the TRC collection tar- 
geting 15,000+ human genes. Infected cells were selected with puromycin 
(2 |ag/ml) for 4 days. After five rounds of AmB selections (Extended Experi- 
mental Procedures), survived populations were collected, and shRNA inserts 
were amplified from genomic DNA by PGR. PGR products were sequenced 
by deep-sequencing. All the deep sequencing data were loglO transformed 
and normalized to standard derivation from the screen-wide mean, which 
depicted as Z score [Z = (gene’s deep sequencing score - average deep 
sequencing score)/screen standard derivation]. Z score equal to 1.96 (p = 
0.05) was used as cut-off value to determine the screen hits. Genes with 
Z score over 1 .96 (p < 0.05) or targeted by five independent shRNAs were 
considered as screen hits. 

Organelle Co-Precipitation Assay 

Triplicate samples for each treatment were homogenized in extraction buffer 
(5 mM MGPS [pFI 7.65], with 0.25 M sucrose, 1 mM EDTA, 0.1% ethanol 
and protease inhibitors) and centrifuged at 1 ,000 x g for 10 min. Supernatants 



Figure 6. Transfer of ^H-Cholesterol from Lysosome to Peroxisome 

(A) Gutline of the in vitro ^H-cholesterol transfer assay. 

(B) Ni Sepharoses bound-peroxisome was preincubated with anti-PI(4,5)P2 or control IgG and then used for the in vitro ^H-cholesterol transfer assay. Values are 
expressed as the percentage of ^H-cholesterol in lysosome prior to reaction and presented as the mean ± SD of three independent repeats of experiments. 

**p<0.01. 

(C) Radiolabeled lysosomes were isolated from cells transfected with indicated siRNAs. Peroxisome was purified from wild-type cells and were then used for the 
in vitro ^H-cholesterol transfer assay as in (A). Data are presented as the mean ± SD of three independent repeats of the experiments. **p < 0.01 . 

(D) HEK293T cells transfected with indicated siRNAs were depleted of cholesterol and then pulsed with ^H-cholesteryl oleate-LDL for 3 hr. Then, lysosome and 
peroxisome were purified separately and the ^H-cholesterol were measured. Values are expressed as percentage of control lysosome and presented as the mean 
± SD of three independent experiments. *p < 0.05, **p < 0.01 . 

(E) HeLa cells transfected with indicated siRNAs were subjected to analysis of SREBP-2 cleavage. pSREBP2, precursor of SREBP2; nSREBP2, nuclear form of 
SREBP2; CHC, clathrin heavy chain. *lndicates the nonspecific band. 

(F) HeLa cells transfected with the indicated siRNAs were subjected to cholesterol esterification assay. TG, triacylglycerol. Quantification of cholesteryl f'^C]- 
esters was analyzed by Image J. NS, not significant, **p < 0.01 . 

(G) A working mechanism of LDL-derived cholesterol transport out of lysosome. 

See also Figure S7. 
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Figure 7. Cholesterol Accumulation in Animals and Human Patients with Peroxisomal Disorders 

(A) Filipin staining of unesterified choiesteroi in zebrafish embryos. Scaie bar, 10 |im. 

(B) Fiiipin staining of the taii-tip fibrobiast ceiis from the mice at the age of 7 months (n = 4 per group). Scaie bar, 10 lam. 

(legend continued on next page) 
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were incubated with Ni Sepharoses at 4°C rotating for 2 hr. Beads were 
washed five times with extraction buffer. Then, 1 |ii Ni Sepharoses were 
mounted and anaiyzed by confocai microscope. Proteins bound to Sepharo- 
ses were eiuted and subjected to western biot. 

In Vitro Reconstitution Assay 

EGFP-Hise-SKL-iabeied peroxisome and NPC1-FLAG-mCherry-iabeied iyso- 
some were first isoiated by iodixanoi density gradient centrifugation, respec- 
tiveiy. The iysosome fractions were diiuted with reconstitution buffer 
(250 mM sucrose, 1 mM DTT, 1 mM MgCi 2 , 50 mM KCi, and 20 mM HEPES 
[pH 7.2]), precipitated at 28,000 x g for 30 min and resuspended in reconsti- 
tution buffer. The peroxisome fractions were incubated with Ni Sepharoses, 
washed with reconstitution buffer pius 2 mM EGTA for four times, and then 
incubated with iysosome in the presence or absence of 1 mg/mi cytosoi, 
1 mM ATP, 1 mM GTP, and ATP-regenerating system (30 mM creatine phos- 
phate, 0.05 mg/mi creatine kinase) at 37°C for 30 min. if needed, anti-Pi(4,5)P2 
and Syt7-C2AB protein were appiied 1 0 min at 4°C before the addition of ATP- 
regenerating system and cytosoi. Ni Sepharoses were spun down, washed 
with reconstitution buffer, and subjected to microscopy and western biot. 

In Vitro ^H-Cholesterol Transfer Assay 

Ceiis were incubated with 1 |iCi/mi ^H-choiesteroi in growth medium over- 
night. The ceiis were then washed with PBS containing 0.2% BSA twice. 
The ^H-choiesteroi-iabeied iysosome was isoiated and its content of ^H- 
choiesteroi was measured using iiquid scintiiiation. Peroxisome was purified 
from uniabeied ceiis by aforementioned method. The peroxisome and ^H- 
choiesteroi-iabeied iysosome were subjected to in vitro reconstitution assay 
as described above. After incubation for different time durations at 37°C, the 
sampies were spun down and iysosome was washed off by washing with 
reconstitution buffer pius 2 mM EGTA for four times. The ^H-choiesteroi on 
Ni Sepharoses bound-peroxisome was measured by iiquid scintiiiation. The 
^H-choiesteroi content in peroxisome was then normaiized to the totai input 
^H-choiesteroi content of iysosome prior to reaction and was expressed as 
a percentage. 

Measurement of LDL-Derived ^H-Cholesterol in Lysosome and 
Peroxisome 

After transfected with the indicated siRNAs, HEK293T ceiis were cuitured in 
choiesteroi-depieting medium for 16 hr and incubated with ^H-choiesteryi 
oieate-LDL for 3 hr at 37°C. Then the ceiis were washed with PBS containing 
0.2% BSA twice. Lysosome or peroxisome fractions were isoiated by density 
gradient centrifugation separateiy and anaiyzed in a iiquid scintiiiation 
counting. 

Animals and Treatment 

Aii animais were maintained and used in accordance with the guideiines of the 
institutionai Animai Care and Use Committee of the Shanghai institutes for Bio- 
iogicai Sciences. Mice were treated as described in the figure iegends. 

SUPPLEMENTAL INFORMATION 

Suppiementai information inciudes Extended Experimentai Procedures, seven 
figures, two tabies, and two movies and can be found with this articie oniine at 
http://dx.d 0 i. 0 rg/l 0. 1 01 6/j.ceil.201 5.02.01 9. 
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SUMMARY 

Protein-DNA binding is mediated by the recognition 
of the chemical signatures of the DNA bases and 
the 3D shape of the DNA molecule. Because DNA 
shape is a consequence of sequence, it is difficult 
to dissociate these modes of recognition. Here, we 
tease them apart in the context of Hox-DNA binding 
by mutating residues that, in a co-crystal structure, 
only recognize DNA shape. Complexes made with 
these mutants lose the preference to bind se- 
quences with specific DNA shape features. Intro- 
ducing shape-recognizing residues from one Hox 
protein to another swapped binding specificities 
in vitro and gene regulation in vivo. Statistical ma- 
chine learning revealed that the accuracy of binding 
specificity predictions improves by adding shape 
features to a model that only depends on sequence, 
and feature selection identified shape features 
important for recognition. Thus, shape readout is a 
direct and independent component of binding site 
selection by Hox proteins. 

INTRODUCTION 

Precise control of gene expression relies on the ability of tran- 
scription factors to recognize specific DNA binding sites. Two 
distinct modes of protein-DNA recognition have been described: 
base readout, the formation of hydrogen bonds or hydrophobic 
contacts with functional groups of the DNA bases, primarily in 
the major groove (Seeman et al., 1976), and shape readout, 
the recognition of the 3D structure of the DNA double helix 
(Rohs et al., 2009a). The importance of shape readout has 
been inferred from crystal structures of protein-DNA complexes 
(Joshi et al., 2007; Kitayner et al., 2010; Meijsing et al., 2009; 

CrossMark 



Rohs et al., 2009b) and from structural features of DMAs selected 
by DNA-binding proteins in high-throughput binding assays 
(Dror et al., 2014; Gordan et al., 2013; Lazarovici et al., 2013; 
Slattery et al., 201 1 ; Yang et al., 2014). However, as DNA shape 
is a function of the nucleotide sequence, it is difficult to tease 
apart whether a DNA binding protein favors a particular binding 
site because it recognizes its nucleotide sequence or, alterna- 
tively, structural features of the DNA molecule. Thus, whether 
DNA shape is a direct determinant of protein-DNA recognition 
remains an open question. In addition to being a potentially 
important mode of DNA recognition, if DNA binding proteins 
directly use shape readout then incorporating DNA structural in- 
formation should significantly improve models for predicting 
DNA binding specificity, which remains challenging with existing 
methods (Slattery et al., 2014; Weirauch et al., 2013). 

We previously described a role for DNA shape in the recogni- 
tion of specific binding sites by the Hox family of transcription 
factors, which in vertebrates and Drosophila specify the unique 
characteristics of embryonic segments along the anterior-poste- 
rior axis (Joshi et al., 2007; Mann et al., 2009; Slattery et al., 
2011). Using in vitro selection combined with deep sequencing 
(SELEX-seq), which examines millions of sequences in an unbi- 
ased manner, we found that while Hox proteins bind highly 
similar sequences as monomers, heterodimerization with the 
cofactor Extradenticle (Exd) uncovers latent DNA binding spec- 
ificities (Slattery et al., 2011). High-throughput DNA shape pre- 
dictions (Zhou et al., 2013) for sequences selected by each 
Exd-Hox complex (containing the motif NGAYNNAY) revealed 
that anterior and posterior Hox proteins prefer sequences with 
distinct minor groove (MG) topographies. Whereas all Exd-Hox 
complexes preferred sequences with a narrow MG near the AY 
of the Exd half-site (NGAY), only anterior Hox proteins (Lab, 
Pb, Dfd, and Scr) selected for sequences containing an addi- 
tional minimum in MG width at the AY of the Hox half-site 
(NNAY) (Figures 1A and SI) (Slattery et al., 2011). However, 
this study, as well as analyses of other protein-DNA complexes 
(Gordan et al., 2013; Yang et al., 2014), did not rule out the 
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Figure 1. Scr’s Narrow-MG Recognizing Residues Are Required for Binding Specificity 

(A) Two views of the Exd-Scr heterodimer bound to the Scr-specific target fkh250 (Protein Data Bank [PDB] ID 2R5Z) (Joshi et al., 2007). 

(B) Plot of MG width derived from the Exd-Scr co-crystal structure showing that Arg5 (red) inserts into the MG width minimum at the Exd half-site (NGAY) while 
Arg3 and His-12 (blue) insert into the MG width minimum at the Hox half-site (NNAY). 

(C) Amino acid sequences of Scr variants. Numbering is relative to the first residue in the homeodomain. Only sequences from the Exd-interaction motif YPWM 
through the homeodomain N-terminal arm are shown. The rest of the protein is wild-type in all cases. Red highlights mutated residues. 

(D) 1 2-mer relative affinities of binding sites selected by each Scr variant in complex with Exd are color-coded according to the ten most frequently observed Exd- 
Hox binding sites. 



(legend continued on next page) 
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possibility that these shape preferences were merely a second- 
ary consequence of base readout preferences. 

A key prediction of the shape-recognition model is that if the 
residues that recognize a distinct structural feature of the DNA, 
for example a local minimum in MG width, are mutated then 
the transcription factor should no longer prefer to bind DNA se- 
quences containing that feature. Alternatively, if the structural 
feature were merely a byproduct of the DNA sequences selected 
by a base readout mechanism in the major groove, the binding 
sites preferred by the mutant factor would still contain that 
feature. Here, we tested this prediction using the anterior Hox 
protein Scr, which binds DNA with Exd to regulate Scr-specific 
target genes during Drosophila embryogenesis (Ryoo and 
Mann, 1999). In a co-crystal structure of the Exd-Scr hetero- 
dimer bound to an Scr-specific target site, fkh250 (AGATTAAT), 
both shape readout and base readout mechanisms were evident 
(Joshi et al., 2007). In agreement with the SELEX-seq data, the 
fkh250 binding site contained two MG width minima, one recog- 
nized by Scr residues His-12 and Arg3, and the second recog- 
nized by Scr residue Arg5 (Figures 1A and IB) (Joshi et al., 
2007). As these residues did not form hydrogen bonds with ba- 
ses, the implication is that they use shape readout, and not 
base readout, as their sole mode of DNA recognition. 

To test if Hox proteins directly use shape readout, we charac- 
terized the properties of mutant proteins that, based on the Exd- 
Scr co-crystal structures, are predicted to either lose or gain the 
ability to read specific MG topographies. When MG-inserting 
residues of Scr were mutated to alanines, thus impairing its abil- 
ity to use shape readout, the mutant proteins no longer preferred 
sequences containing these MG width minima. Conversely, 
when MG recognizing residues from Scr were transferred to a 
Hox protein that normally does not select for this structural 
feature, the proteins selected binding sites with two MG minima 
in vitro and gained the ability to activate an Scr-specific target 
gene in vivo. Finally, we show that taking DNA shape features 
into consideration significantly improved the ability to predict 
Exd-Hox binding site specificities compared to models that 
only depend on DNA sequence. Together, these findings 
demonstrate that transcription factors directly use shape 
readout for protein-DNA recognition, and in silico prediction of 
DNA binding specificities will benefit by taking DNA structural 
features into consideration. 

RESULTS 

Mutants that Interfere with Scr’s Ability to Read MG 
Shape 

In an initial set of experiments to tease apart the contributions 
of shape readout from base readout, we mutated Scr residues 



His-12, Arg3, and Arg5, which, in a co-crystal structure, only 
use shape readout as their mode of recognition (Joshi et al., 
2007) (Figures 1 A and 1 B). We generated a series of mutant pro- 
teins that change these residues to alanines and, consequently, 
impair Scr’s ability to recognize local MG topographies. We 
mutated either Arg3 alone His-12 alone 

both His-12 and Arg3 or Arg5 alone (Scr^'^9®*) 

and tested the effect of these mutations in complex with Exd on 
Scr’s DNA binding site preferences using SELEX-seq (Figure 1 C). 

Because the binding site for Exd-Hox complexes is 12 base 
pairs (Slattery et al., 2011), we generated 12-mer relative affin- 
ities for each Scr mutant in complex with Exd using small mod- 
ifications of our previously described procedure (see Experi- 
mental Procedures) (Riley et al., 2014; Slattery et al., 2011), 
and compared them to the affinities generated by wild-type 
(WT) Exd-Scr heterodimers. We color-coded the 12-mers based 
on their core 8 -mer (Figure 1 D) (Slattery et al., 2011). Compared 
to Scr WT, all three mutants showed an increased relative pref- 
erence for the green (TGATTGAT), yellow (TGATGGAT), and pur- 
ple (TGATCGAT) motifs, and a decrease in the preference for the 
blue (TGATTAAT) motif (Figures ID and IE). Because the blue 
motif includes the Scr-specific fkh250 binding site, we directly 
compared the blue and red (Exd-Hox consensus) motifs by 
plotting the relative affinities of 12 -mer pairs that only differed 
at the single position that distinguished them from being blue 
or red (e.g., nnTGATTAATnn with nnTGATTTATnn). Whereas 
ScrWT showed a preference for blue compared to red motifs 
over the entire range of affinities, this preference was weakened 
for Scr^''9®^ and abolished for Scr^'®"'^^’ (Figure 1 F). 

Although the above results show that Arg3 and His-12 are 
required for Scr’s binding site preferences, they do not address 
if this is due to their preference for a specific MG shape. T o deter- 
mine if His-1 2, Arg3, and Arg5 directly enable the selection of se- 
quences with narrow MGs, we computed the average MG width 
profile for thousands of 16-mer sequences that were preferen- 
tially bound by each Scr variant in our SELEX-seq experiments. 
We employed DNAshape, a high-throughput method for the 
prediction of the structural features of DNA sequences based 
on the average conformations of pentamers derived from all- 
atom Monte Carlo simulations (Zhou et al., 2013). Sequences 
selected by had an average MG width at Ag 

and Y-io that was significantly wider compared to those selected 
by ScrWT, without affecting the selection of the MG width mini- 
mum at A 5 Y 6 (Figures 2, S2, and S 3 ) (p < 2 x 1 0“^®; Mann-Whit- 
ney U test). Sequences selected by the single mutant Scr^''^^'^, 
but not Scr'^'®"^^'^, had an intermediate width at AgY-io, suggest- 
ing that His-12 and Arg3 synergistically contribute to MG 
recognition at the Hox half-site (Figures 2 and S3). Conversely, 
compared to ScrWT, Scr^’'^^'^ selected sequences with a wider 



(E) Comparative specificity piots comparing the reiative binding affinities of 12-mers seiected by Exd-ScrWT (y axis) with each Exd-Scr variant (x axis). Each point 
represents a unique 12-mer that is coior-coded according to the core 8-mer it contains. Gray points represent 12-mers that do not contain any of the ten most 
common cores. The biack iine indicates y = x. 

(F) Piots comparing the reiative affinities of sequences containing a biue motif (TGATTAAT) (y axis) versus a red motif (TGATTTAT) (x axis) for Exd-ScrWT and Exd- 
Scr variants. Each point represents the reiative affinities of a pair of 12-mers that are identicai except for the position that makes it either a biue (nnTGATT^Tnn) 
or a red (nnTGATTTATnn) motif. The black line indicates y = x, and the red line is a linear regression trend line. The slope of the trend line and coefficient of 
determination of the data are indicated. 

See also Figures SI , S2, and S3. 
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MG specifically in the Exd half-site (AsYe), but these sequences 
retained the minimunn at A9Y10 (Figures 2 and S3). These results 
provide strong support for the idea that Arg5 directly selects se- 
quences with a MG minimunn at A^Yq, while Arg3/His-1 2 directly 
select sequences with a MG width minimum at AgYio- Selection 
of these MG width minima occurs independently, even though 
they are only separated by two base pairs. 

Despite its importance in selecting Scr-specific features of MG 
topography, Arg3 is present in many Hox homeodomains, in- 
cluding Antennapedia (Antp), that do not select a MG minimum 
at A9Y10 (Figures 3A and S1) (Slattery et al., 2011). This observa- 
tion prompted the question of why Arg3 in Antp and other poste- 
rior Hox proteins does not select for a narrow MG at this position. 
We speculated that the amino acids flanking Arg3 might play a 
role in binding site selection by correctly positioning this MG-in- 
serting side chain. Indeed, although both Scr and Antp have Arg3 
and Arg5, these residues are part of an N-terminal arm motif that 
differs between these two Hox proteins (R3Q4R5T6 in Scr and 
R3G4R5Q6 in Antp) (Figure 3A). To test whether residues flanking 
these arginines play a role in Scr binding specificity we charac- 
terized an additional mutant, Scr'^'®"''^'^GQ (Figures 1C-1F). In 
ScrHis-i2AGQ^ His-12 is mutated to alanine and the fourth and 
sixth positions in the Scr homeodomain are changed to that of 
Antp (Gln4 to Gly4 and Thr© to GIne) to mimic Antp’s R3G4R5Q6 
motif. Strikingly, this mutant failed to select sequences with 
a minimum at the Hox half site (A9Y10) (Figure 2). An addi- 
tional mutant, ScrLinkGQ, that, in addition to having Antp’s 
R3G4R5Q6 motif, has Antp’s linker (residues in between 
the YPWM Exd interaction motif and the homeodomain. Fig- 
ure S2A) in place of Scr’s linker, showed very similar behavior 
to Scr'^'^'^'^'^GQ (Figures 2 and S2). Together, these data suggest 
that additional residues within and adjacent to the N-terminal 
arm, which do not make direct contact with the DNA (minor or 
major groove), play an important role in selecting Hox-specific 
MG topographies, likely by positioning the MG inserting side 
chains of Arg3, Arg5, and His-12. 

Mutants that Transfer Scr’s Ability to Read MG Shape 
to Antp 

The above experiments demonstrate that MG-inserting side 
chains in Scr are necessary for Scr’s ability to select sequences 
with local MG width minima. To test whether MG recognizing res- 



Figure 2. Loss of MG Width Preferences in 
the Absence of MG-Recognizing Residues 

Heat map of the average MG width at each posi- 
tion of 16-mers seiected by each Exd-Hox heter- 
odimer. Dark green represents narrow MG regions 
whereas white represents wider MG regions. The 
number of sequences anaiyzed for each compiex 
is shown on the right. Biack iines demarcate where 
Arg5 inserts into the MG {A^Yq) and, for ScrWT, 
where Arg3 and His-12 insert into the MG (AgY-io)- 



idues are sufficient to confer Scr’s binding 
preferences to a different Hox protein, we 
introduced these residues into Antp, 
which normally prefers sequences with 
wider MG regions at A9Y10 (Figure 2). We created a series of 
Antp mutants that contained various combinations of Scr-spe- 
cific amino acids in two regions, the linker and the N-terminal 
arm motif R3Q4R5T6 (Figure 3A). Remarkably, the 1 2-mer relative 
affinity profiles of these Antp mutants (in complex with Exd) grad- 
ually converged toward that of ScrWT upon the introduction of 
residues used for MG width recognition (Figures 3B-3D). All of 
the residues tested — Gln4, Thr6, His-12, and the linker— contrib- 
uted to the convergence of Antp’s binding specificity toward that 
of Scr, with the most Scr-like mutant, AntpLinkQT, being nearly 
indistinguishable from ScrWT (Figures 3B and 3C). A direct com- 
parison of the relative affinity for the red motif versus the Scr- 
preferred blue motif also revealed a gradual shift in preference 
toward the blue motif (Figure 3D). Thus, Scr-specific amino acids 
from its linker and N-terminal arm are sufficient to confer Scr’s 
binding specificity on another Hox protein. 

To determine if these Antp mutants also share Scr’s MG shape 
preferences, we used DNAshape to predict the MG widths of 1 6- 
mers selected by these proteins. In general, the average MG 
width at A9Y1 0 of the sequences selected by the Antp mutant se- 
ries became narrower, toward that of Scr, upon the introduction 
of Scr-specific residues (Figure 4A), where, with the exception of 
AntpHQ, each successive mutant selected sequences with a 
statistically significant narrowing of the average MG at these po- 
sitions (Figure S4A). On average, the differences in MG widths at 
these positions were larger for high-affinity sequences than for 
low-affinity sequences (Figure S4B). Taken together, these re- 
sults suggest that Scr residues Gln4, Thr6, His-12, and linker 
all contribute to the recognition of DNA shape. Moreover, these 
residues are sufficient to confer the shape preferences of Scr 
when inserted into another Hox protein. 

As an alternative way to analyze these data, we compared the 
binding specificities of each Antp variant geometrically by calcu- 
lating the Euclidean distances between the MG width profiles of 
sequences selected by each variant with the average MG width 
of those selected by Exd-AntpWT and Exd-ScrWT, respectively. 
The resulting density plots showed two occupancy peaks, one 
representing sequences that are more similar to those selected 
by AntpWT, and a second representing sequences that are 
more similar to those selected by ScrWT (Figure 4B). With the 
exception of AntpHQ, each successive Antp variant showed a 
gradual shift toward the ScrWT peak, with AntpLinkQT showing 
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a nearly complete shift. Thus, key Scr-specific residues in the 
linker and N-terminal arm were sufficient to convert Antp’s shape 
preferences to those of Scr. 

Antp Variants that Mimic Scr’s DNA Shape Preferences 
Activate an Scr-Specific Target In Vivo 

The above results demonstrate that shape readout, mediated 
by a limited number of Hox residues, is an essential component 
of DNA recognition by Exd-Hox heterodimers in vitro. But how 
relevant is this readout mechanism in vivo? To answer this 
question we examined the ability of these Antp variants to acti- 
vate fkh250-lacZ, an Scr-specific reporter gene that contains a 
binding site (AGATTAAT) with two MG width minima (Joshi 
et al., 2007). In otherwise wild-type embryos, fkh250-lacZ 
expression was limited to Scr-expressing cells in parasegment 

2 (PS2) (Ryoo and Mann, 1999) (Figure 5A). Ectopic expression 
of ScrWT using the prd-Gal4 driver activated fkh250-lacZ in 
segments outside PS2 (Figure 5B), and His-12 and Arg3 of 
Scr are required for this activation (Joshi et al., 2007). In contrast 
to Scr, ectopic expression of AntpWT did not activate fkh250- 
lacZ (Figure 5C). However, ectopic expression of AntpHQT 
resulted in modest fkh250-lacZ activation (Figure 5D), while 
ectopic expression of AntpLinkQT, the Antp mutant whose 
binding specificity most closely resembled Scr in vitro (Figures 

3 and 4), resulted in strong activation of fkh250-lacZ (Figure 5E). 
Thus, Antp mutants that prefer to bind sequences with two MG 
width minima in vitro, the normal topography of an Scr-specific 
binding site, also have the ability to activate an Scr-specific 
target gene in vivo. 

DNA Shape Features Improve Accuracy of Binding 
Specificity Predictions 

If shape readout is a direct and independent determinant of Hox- 
DNA binding specificity, we speculated that shape features of 
the target DNA could be used to improve quantitative predictions 
of relative binding affinities. To test this notion, we trained an L2- 
regularized multiple linear regression (MLR) model (Yang et al., 
2014) for each of the mutants and WT Hox proteins. We used 
1 0-fold cross validation in order to train and determine the accu- 
racy of a given model, quantified as the coefficient of determina- 
tion R^. These MLR-derived R^s are robust as they are highly 
correlated with R^s derived using an alternative machine learning 
approach, support vector regression (e-SVR) with a linear kernel 
(Figure S5; see Experimental Procedures for details) (Gordan 
et al., 2013; Zhou et al.,2015). 

Using MLR, addition of MG width to a model based only on 
nucleotide sequence resulted in a modest improvement in R^ 
of on average 1 2% (Figures 6A and S6A). Like MG width, adding 
three other shape features one at a time. Roll, propeller twist 
(ProT), and helix twist (HelT), also led to a modest improvement 
in accuracy (Figure S6A). Inclusion of all four DNA shape features 
in combination further increased prediction accuracy (Figures 6B 
and S6A). The improvement in binding affinity prediction accu- 
racy, on average 26% when incorporating all four shape fea- 
tures, yielded the largest effect with high significance (p = 6 x 
10“^; Mann-Whitney U test). The addition of any combination 
of three shape features led to an intermediate increase in predic- 
tion accuracy, in some cases similar to that after addition of all 



four shape features (Figure S6A). These results suggest that all 
four features contribute to Exd-Hox-DNA target selection in a 
non-additive manner, consistent with the interdependency of 
these features (Olson et al., 1998). Thus, including DNA shape 
features in addition to MG width improves binding site predic- 
tions over models based only on nucleotide sequence. 

For comparison, we also assessed the benefit of adding 
shape features for the prediction of Hox monomer specificities. 
Interestingly, in this case the improvement in R^ was, on average, 
only 6.4%, suggesting a larger role for DNA shape in conferring 
heterodimer specificity than monomer specificity (Figure S6B). 

DNA Shape Contributes to Binding Specificities in a 
Position-Specific Manner 

Next, we hypothesized that if shape features contribute to an 
improvement in binding specificity prediction, then it might be 
possible to localize this effect within the binding site. We trained 
models using the sequence of the entire binding site augmented 
by all four shape features at individual positions one at a time, re- 
sulting in a set of models that tested the contribution of shape at 
each position of the binding site. We compared these models to 
a sequence-only model and calculated a AF?^. This analysis high- 
lighted the importance of DNA shape for predicting Exd-Hox 
binding specificities in the core, but not the flanks, of the binding 
site (Figure 6C). 

To analyze the role of DNA shape in a complementary manner, 
we trained shape-only models using the four shape features at all 
nucleotide positions, leaving out this information one position at 
a time, resulting in a set of models that assessed the relative 
importance of DNA shape at each position of the binding site. 
AF?^s were calculated relative to a model that included the four 
shape features at all positions. In this analysis, prediction accu- 
racy was expected to decrease most when shape features were 
removed from the model at positions that were important for 
shape readout. Interestingly, we detected the greatest effect at 
the Ag position of the Hox half-site, followed by slightly weaker 
effects at the adjacent Yio position and the G4 position of the 
Exd half-site (Figure 6D). Eliminating shape features from the re- 
maining positions had a smaller impact on the ability to predict 
binding specificities. 

Within each SELEX-seq data set, the sequences were most 
variable at the Ns position, raising the possibility that the success 
of these models might be driven in large part by this position. To 
test this idea and better assess the role of DNA shape throughout 
the binding site we trained additional models in which we 
removed sequence information at the Ns position (“sequence- 
Ns model”). Leaving out sequence information at the Ns position 
did not significantly affect the accuracy of a sequence+shape 
model, suggesting that sequence information at Ns is not essen- 
tial for its performance (Figure S7A). When MG width information 
was added to the sequence-Ns model, the ability to predict bind- 
ing specificities was greatly enhanced compared to the same 
model without MG width information (Figures S7B and S7C). 
These results argue that MG width information is more important 
than sequence at positions with a degenerate sequence signal, 
such as at Ns, where direct readout is not playing a role. The 
removal of the confounding sequence information at this position 
uncovered MG width as an independent specificity determinant. 
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Figure 3. Introducing Scr’s MG Width-Recognizing Residues into Antp Converts Its Binding Specificity to that of Scr 

(A) Amino acid sequences (from the Exd interaction motif, YPWM, through the N-terminal arm of the homeodomain) of Antp variants. Green highlights residues 
specific to AntpWT, and red highlights residues specific to ScrWT. Non-highlighted residues are common between the two Hox proteins. Numbering is relative to 
the first residue of Scr’s homeodomain. The rest of the protein is wild-type in all cases. 

(B) 12-mer relative affinities of binding sites selected by each Antp variant in complex with Exd are color-coded according to the ten most commonly observed 
Exd-Hox motifs. AntpWT and ScrWT are included to show the progression of the binding preferences from AntpWT toward ScrWT. 

(C) Comparative specificity plots of the relative affinity of sequences selected by Exd-ScrWT (y axis) and each Exd-Antp mutant (x axis). Each point represents a 
unique 1 2-mer that is color-coded according to the core 8-mer it contains. Gray points represent 1 2-mers that do not contain any of the ten most common cores. 
The black line indicates y = x. 

(legend continued on next page) 
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When all four shape features were added to the sequence-Ns 
model at single positions one at a time the contribution of DNA 
shape within the core motif was very apparent (Figure 7A), and 
significantly stronger than when the starting model included 
sequence information at the Ns position (compare with Fig- 
ure 6C). If instead of all four DNA shape parameters only MG 
width was added position by position to the sequence-Ns model, 
the average improvement in R^, while smaller, was most 
apparent at or adjacent to YeNj and A9Y10 (Figure 7B). Thus, 
although DNA shape is generally important within the entire 
core of the binding site, the contribution of MG width is strongest 
at the two AY regions, precisely where local minima in MG width 
were observed in the Exd-Hox X-ray structures (Joshi et al., 
2007) and SELEX-seq data (Figures 2 and 4). 

Taken together, quantitative predictions based on regression 
models indicated that shape features become important where 
sequence information is not well defined, more likely at positions 
that are not involved in base readout. In these cases, shape 
features contain more information than sequence alone, and 
removing the signal from sequence enables the quantitative 
modeling of the role of shape features on binding specificity. 

DNA Shape Features Discriminate Anterior from 
Posterior Hox Binding Specificities 

To understand to what extent shape features can help distinguish 
Exd-ScrWT from Exd-AntpWT binding specificities, we assigned 
avalueof +1 to the top 50% of sequences selected by Exd-ScrWT 
and -1 to the top 50% of sequences selected by Exd-AntpWT 



Figure 4. Shape Readout Properties of Antp 
Variants with Scr-Specific Residues 

(A) Heat map of the average MG width at each 
position of all statistically significant 16-mers 
selected by each Exd-Hox complex. Dark green 
represents narrow MG regions whereas white 
represents wider MG regions. The number of se- 
quences analyzed for each protein is shown on the 
right. Black lines demarcate where Arg5 inserts 
into the MG (AsY©) and, for Scr, where Arg3 and 
His-12 insert into the MG (AgYio)- 

(B) Histogram representing the distribution of 
MG width similarities for each of the sequences 
selected by each Antp variant in comparison to 
those selected by ScrWT and AntpWT. The y axis 
represents the density of 16-mers at different 

■ ScrWT A(Euclidean distance) scores (x axis). Sequences 

i AntpLinkQT more similar to those selected by ScrWT receive a 

■ AntpHQT negative score, and sequences more similar to 

■ AntpQT those selected by AntpWT receive a positive score. 

, ^ See also Figure S4. 

■ AntpHT 

■ AntpHQ 

■ AntpWT 

(see Experimental Procedures for details). 
We then used sequence- and shape- 
based models to evaluate the discrimina- 
tive power of the selected features. Using L2-regularized MLR 
and 10-fold cross validation, we calculated the area under the 
receiver-operating characteristic curve (AUC) as a criterion for a 
model to discriminate ScrWT-like from AntpWT-like binding spec- 
ificities. We found that MG width alone, without using sequence or 
additional shape features, discriminates between the binding 
specificities of both Exd-Hox complexes with high accuracy (Fig- 
ure S7D). Thus, MG width does not merely refine binding speci- 
ficity but is a powerful descriptor on its own, at least for discrimi- 
nating between these two Exd-Hox complexes. Classification 
models using other shape parameters performed similarly well 
(Figure S7D), indicating that a classification between two states 
is less sensitive than quantitative prediction of binding strength 
using regression models. Further, these results suggest that the 
qualitative differences that are apparent in the MG width heat 
maps (Figures 2 and 4A) reflect a quantitative difference in anterior 
and posterior Hox specificities. 

Next, we asked which positions in the binding site had the 
highest impact on this classification. To answer this question, 
we calculated the Pearson correlation between the class la- 
bels +1 and -1 for Exd-ScrWT and Exd-AntpWT, respectively, 
and MG width at each position (see Experimental Procedures 
for details). Several positions showed strong, either positive or 
negative, correlations that enabled the classification into 
ScrWT-like and AntpWT-like binding specificities (Figure 7C). 
Two regions showing a negative Pearson correlation aligned 
with the two MG width minima observed in the Exd-Scr co-crys- 
tal structure, and a region of positive Pearson correlation marked 



(D) Plots comparing the relative affinities of sequences containing a blue motif (TGATTAAT) (y axis) versus a red motif (TGATTTAT) (x axis) for ScrWT, AntpWT and 
Antp variants. Each point represents the relative affinities of a pair of 1 2-mers that are identical except for the position that makes it either a blue (TGATT^T) or a 
red (TGATTTAT) motif. The black line indicates y = x, and the red line is a linear regression trend line. The slope of the trend line and coefficient of determination 
of the data are indicated. 
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the region between these minima. This observation confirms that 
the core region is important for the differences in binding speci- 
ficity between paralogous Hox factors. Interestingly, not only is 
the AY region of the Hox half-site important, but the shape of 
the entire core, presumably due to the influence of all core posi- 
tions on the shape of this region. 

Finally, we used classification models to predict whether the 
DMA shape mutants defined in Figures 1, 3, and S2 tend to 
show ScrWT-like or AntpWT-like binding specificities. Here, a 
sequence was classified as ScrWT-like if the class label was pre- 
dicted to be >0, and as AntpWT-like if the class label was pre- 
dicted to be <0. This classification indicated a gradual change in 
the fraction of sequences selected by any of the mutants assigned 
as ScrWT- versus AntpWT- preferred sequences (Figure 7D). 
These data quantitatively confirm the qualitative observations 
shown above (Figures 2 and 4A) that MG width topography is an 
important binding specificity signal for Hox proteins. 

DISCUSSION 

Despite significant effort in the field, it is still not possible to 
accurately decipher the regulatory information that is encoded 



Figure 5. Scr’s MG Width Readout Residues 
Confer the Ability to Activate an Scr- 
Specific Target In Vivo when Incorporated 
into Antp 

(A) In wild-type embryos fkh250-lacZ is activated 
only in parasegment 2 (PS2), where endogenous 
Scr is expressed (arrowhead). In this and all 
panels, anterior is to the left. 

(B) Ectopic expression of ScrWT using prd-Gal4 
(visualized with red stripes of ectopic expression in 
the panel on the right) activates fkh250-lacZ 
anterior and posterior to PS2. Activation is stron- 
gest anterior to PS2 (bracket) and immediately 
posterior to PS2 (thick arrow), with weaker acti- 
vation in abdominal segments (thin arrows). 

(C) Ectopic expression of wild-type Antp does not 
activate fkh250-lacZ. 

(D) Ectopic expression of AntpHQT leads to weak 
ectopic fkh250-lacZ expression anterior and pos- 
terior to PS2 (thin arrows). 

(E) Ectopic expression of AntpLinkQT leads to 
activation both anterior and posterior to PS2. 
Activation is strongest anterior to PS2 (bracket) 
and immediately posterior to PS2 (thick arrow), 
with weaker activation in abdominal segments 
(thin arrows). 



in the DMA sequences of eukaryotic ge- 
nomes (Slattery et al., 2014). In the 
work described here, we used a combi- 
nation of in vitro, in vivo, and computa- 
tional approaches to show that intrinsic 
DMA structural characteristics— collec- 
tively referred to as DMA shape— are be- 
ing directly read by DMA binding proteins 
when they recognize their binding sites. 

Thus, analogous to mechanisms in which 
DMA base pairs are directly read by proteins via hydrogen bonds, 
the recognition of DMA shape independently contributes to both 
binding affinity and specificity. Using this information, we show 
that including DMA shape features significantly enhances the 
ability to predict DMA binding specificities and thus will greatly 
improve models for accurately predicting transcription factor 
binding in eukaryotic genomes. 

Separable Contributions of DNA Shape and Sequence to 
Protein-DNA Recognition 

Although several previous reports suggested the importance of 
DNA shape in protein-DNA recognition, all prior work was unable 
to definitively discriminate between the roles of DNA shape and 
sequence. Although DNA shape features, such as MG width, 
were previously found to contribute to binding specificity (Dror 
et al., 2014; Gordan et al., 2013; Lazarovici et al., 2013; Yang 
et al., 2014), here the roles of DNA sequence and shape have 
been separated and analyzed in an unbiased manner. To achieve 
this, we mutated Scr amino acid side chains that do not make 
direct base contacts in the major groove, but instead either insert 
into the MG (His-12, Arg3, Arg5) or indirectly influence these in- 
teractions (Gln4, Thr6, linker). The combination of SELEX-seq 
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Figure 6. DNA Shape Features Improve 
Quantitative Predictions of DNA Binding 
Specificities of Exd-Hox Heterodimers 

(A) Scatter plot representing the coefficient of 
determination obtained using a sequence-only 
model (x axis) compared to a model using 
sequence and MG width (y axis). Each point rep- 
resents a different Exd-Hox heterodimer and is 
color-coded as indicated. 

(B) Scatter plot representing the coefficient of 
determination fP obtained using a sequence-only 
model (x axis) compared to a model using sequence 
and four DNA shape features (MG width, Roll, ProT 
and HelT) (y axis). Quantitative measures for the 
improvement of the prediction accuracy of the 
logarithm of relative binding affinities using shape- 
augmented models are provided in Figure S6. 

(C) Box plots illustrating the contribution from DNA 
shape features to model accuracy when shape 
features were added to a sequence model at each 
position individually. The effect on the coefficient 
of determination is shown for adding four 
shape features (MG width. Roll, ProT and HelT) 
position-by-position to the sequence model. The 
centerline of the box plots represents the median, 
the edge of the box the first and third quartile, and 
the whiskers indicate minimum/maximum values 
within 1 .5 times the interquartile from the box. 

(D) Box plots illustrating the contribution from DNA 
shape features to model accuracy when sequence 
features were removed. The effect on the coeffi- 
cient of determination AF?^ is shown for leaving out 
four shape features (MG width. Roll, ProT, and 
HelT) position-by-position from a shape-only 
model that does not contain any sequence infor- 
mation. The box plots are defined in (C). 

See also Figures S5 and S6. 



with high-throughput DNA shape analysis allowed us to show the 
effect of these mutations on the selection of DNA binding sites 
with distinct shape characteristics. Further, not only were these 
amino acid side chains necessary for conferring the DNA binding 
preferences of these proteins, they were sufficient to confer this 
specificity, both in vitro and in vivo, when grafted into a different 
Hox protein, Antp. These experiments effectively tease apart the 
contributions of shape readout from base readout. We speculate 
that the readout of DNA shape may be a general mechanism that 
transcription factors use to recognize their binding sites. More- 
over, for transcription factors that are members of large paralo- 
gous families, such as the Hox proteins, DNA shape may be 
essential for distinguishing between binding sites that are diffi- 
cult to discriminate based on base readout alone. 

Statistical Machine Learning Reveals DNA Structure- 
Based Binding Specificity Signals 

To complement and extend the in vitro and in vivo studies, we 
used statistical machine learning, in this case multiple linear 
regression (MLR), to computationally analyze the contributions 
of DNA sequence and shape. Using this approach we were 
able to (1) quantify the overall contribution of shape features to 
binding specificity and (2) compute the relative contributions of 
DNA shape and sequence at individual positions within the bind- 



ing site. Extensive experimental work, involving structure deter- 
mination and mutagenesis, represents the current standard 
approach for uncovering DNA readout mechanisms of transcrip- 
tion factors. The quantitative modeling introduced here suggests 
an alternate route for deriving such mechanistic information from 
high-throughput sequencing data. These methods will therefore 
likely be valuable when used to predict the DNA binding specific- 
ities of other transcription factors and when analyzing their inter- 
actions with genomes. 

To identify positions in the binding site where shape features 
contribute substantially to binding specificity, we used a form 
of feature selection in which we compared models with different 
feature sets by computing a AF?^ relative to a reference 
model. We found that the shape features in the core of the 
Exd-Hox heterodimer binding site were important for paralogous 
binding specificity. This observation is distinct from previous 
observations for another family of transcription factors, basic he- 
lix-loop-helix (bHLH) factors, where shape features in regions 
flanking the core binding site play an important role in discrimi- 
nating binding specificities of related family members in yeast 
(Gordan et al., 2013) and human (Yang et al., 2014). Further, 
our feature selection approach indicates that shape features at 
the AY region of the Hox half-site were the most critical for deter- 
mining binding specificity. This finding agrees with qualitative 
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Figure 7. Models that Deconvolve DNA 
Sequence and Shape 

(A) Removing sequence features at the Ns position 
where sequence is ieast constrained across the 
seiected sequences from the sequence+shape 
modei further emphasizes the contribution of adding 
DNA shape to modei accuracy. Whereas removing 
sequence information at this position has essentiaiiy 
no effect on modei accuracy (Figure S7A), adding 
MG width to the sequence-Ns modei has a iarge 
effect on prediction accuracy (Figure S7B). Based 
on this finding, the effect on the coefficient of 
determination AFP is shown in box piots for adding 
four shape features (MG width, Roll, ProT, and HelT) 
position-by-position to the sequence-Ns model. The 
centerline of the box plots represents the median, 
the edge of the box the first and third quartile, and 
the whiskers indicate minimum/maximum values 
within 1 .5 times the interquartile from the box. 

(B) Box plots illustrating the effect on the coefficient 
of determination Af?^ for adding MG width infor- 
mation position-by-position to the sequence-Ns 
model emphasize the roleof the AY and immediately 
adjacent positions. The box plots are defined in (A). 

(C) Pearson correlations (red) between MG width 
(MGW) and binding site labels (+1 for ScrWT-like 
versus -1 for AntpWT-like) track with the MGW 
pattern (blue) observed in the co-crystal structure 
(Joshi et al., 2007), emphasizing the important role 
of MGW in the core region of Exd-Hox binding site. 

(D) A sequence+shape classification model cap- 
tures the gradual change of binding specificities 
introduced by mutations of the N-terminal arm 
and linker sequences with some Exd-Hox mutant 
heterodimer specificities classified as Scr-like (red) 
and others as Antp-like (blue). 

See also Figure S7. 



observations in a previous study (Slattery et al., 2011) and in this 
work (Figures 2 and 4) that shape selections varied most sub- 
stantially at this position for both wild-type and mutant Hox pro- 
teins. While this was previously a qualitative observation, the 
current study shows the effect quantitatively. The machine 
learning and feature selection methods reveal that this informa- 
tion will likely provide a powerful approach when analyzing 
data from high-throughput binding assays for other transcription 
factors. In particular, it is noteworthy that we were able to 
derive structural mechanisms used by Hox transcription factors 
based only on sequence data alone, without solving a 3D 
structure. 

Broader Implications for Recognition of Genomic Target 
Sites by Transcription Factors 

Based on our findings, we propose that as more high- 
throughput DNA binding data become available (Hume et al., 
2015; Jolma et al., 2013; Zhu et al., 2011), DNA shape param- 
eters should be taken into consideration when analyzing and 
subsequently scanning genomes for DNA binding site prefer- 
ences. Further, although different families of transcription fac- 
tors may use DNA shape in various ways, this information 
may be used to inform binding site prediction algorithms. As 
shown here quantitatively, Exd-Hox heterodimers use distinct 



structural features in the DNA, such as local regions of narrow 
MG, to achieve DNA binding specificity. Because MG width 
minima are distinct structural motifs, we were able to separate 
their contributions to DNA recognition both biochemically, by 
mutating amino acids that recognize these motifs, and compu- 
tationally, by training models that include or exclude specific 
subsets of DNA features. For other protein families, the contri- 
bution of DNA structure may not be as readily separable as it is 
for Exd-Hox binding. For example, although previous work 
demonstrated a role for DNA shape in conferring the binding 
specificity of bHLH proteins, this effect was mediated by se- 
quences flanking the core binding site (E-box), where no known 
protein-DNA interactions (base or shape readout) occur (Gor- 
dan et al., 2013). In this case, the role of DNA shape may be 
biochemically inseparable from base readout because it is un- 
likely that a distinct structural motif is formed by the flanking 
sequences. 

Our results have implications for the design of binding site 
search and de-novo motif discovery methods, which currently 
most typically rely only on DNA base features (Weirauch 
et al., 2013). There are some examples where large sets of 
overlapping DNA structural features, which are highly interde- 
pendent from each other and inseparable from sequence, 
have been integrated in motif search algorithms (Hooghe 
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et al., 2012; Maienschein-Cline et al., 2012; Meysman et al., 
2011). The results described here, however, suggest that for 
some transcription factor families, distinct structural motifs, 
which can be defined independently from sequence, such as 
MG topography, can be directly integrated in genome analysis 
tools as quantifiable search parameters. The ability to indepen- 
dently define and quantify the role of distinct structural motifs 
will likely yield more powerful algorithms that may help identify 
low affinity, high specificity Hox binding sites that are unrecog- 
nizable with standard approaches (Crocker et al., 201 5). Further, 
machine learning approaches may also contribute to more ac- 
curate models of cooperative transcription factor binding, for 
example in the interferon-p enhanceosome (Chang et al., 

2013) , or in vivo, where DNA shape has been identified as a pre- 
dictive feature for transcription factor binding (Barozzi et al., 

2014) . We further propose that the computational approaches 
described here will also be valuable for deconvolving and 
discovering the roles of DNA shape and sequence even for 
transcription factors such as the bHLH factors where DNA 
shape cannot be as readily separated biochemically from DNA 
sequence. The ability to quantitatively assess the distinct roles 
of DNA sequence and shape will therefore advance our ability 
to identify bona fide genomic binding sites and the ability to 
interpret eukaryotic genomes. 

EXPERIMENTAL PROCEDURES 

Oligonucleotides 

All oligonucleotides used in this study are listed in Table S1. 

Protein Purification 

Scr and Antp mutants were cloned using the QuickChange Site-Directed 
Mutagenesis Kit (Agilent) using his-tagged ScrWT (Joshi et al., 2007) and 
his-tagged AntpWT (Jaffe et al., 1997; Noro et al., 2006) as templates. 
His-tagged proteins were expressed in BL21 cells and purified using 
Cobalt chromatography. For the SELEX-seq experiments, “Exd” refers 
to Exd co-purified with the HM domain of Homothorax (Hth) (Noro 
et al., 2006). 

In Vivo Analysis 

All transgenic UAS lines were generated using the (>-031 integration system 
into the attP2 insertion site. UAS lines were crossed to flies containing 
fkh250-lacZ on the second chromosome and prd-Gal4 on the third chromo- 
some. Embryos were collected at 25°C and stained using rabbit anti-(3-galac- 
tosidase (Cappell) and either mouse anti-Scr (gift from D. Andrews) or mouse 
anti-Antp (8C1 1 ; DSHB). 

SELEX-Seq 

All SELEX experiments were carried out as described (Riley et al., 2014; 
Slattery et al., 2011). In total, five 16-mer libraries were used for multiplex- 
ing (Table SI). Sequencing was performed by lllumina HiSeq 2000/2500. 
The number of sequences analyzed for each protein is listed in Tables S2 
and S3. 

Inferring Relative Binding Affinities 

Fifth order Markov models were constructed using Round 0 (RO) se- 
quences to predict the number of 12-, 14-, and 16-mer sequences in 
each initial library as described (Riley et al., 2014; Slattery et al., 2011). 
R3 data were used for all Hox variants in order to optimize counts and mini- 
mize sampling error. 12-, 14-, and 16-mer relative binding affinities were 
generated by taking the cubic root of the enrichment ratio (counts in R3 
divided by expected counts as predicted using Markov model derived 
from RO data). 



High-Throughput DNA Shape Prediction 

All sequences selected in R3 of SELEX with a count of at least 25 were aligned 
based on the TGAYNNAY (Exd-Hox heterodimers) or TAAT (Hox monomers) 
motifs. Four DNA structural features were derived for these sequences from 
a high-throughput DNA shape prediction method (Zhou et al., 201 3). Euclidean 
distance was used to compare MG width profiles of sequences selected by 
Hox mutants to the average MG width at all positions of sequences selected 
by the Hox WTs. See Extended Experimental Procedures for details. 

Regression Modeis for Predicting Binding Specificities 
Quantitatively 

To predict the relative binding affinity for sequences bound by the Hox mono- 
mers and Exd-Hox heterodimers, we trained L2-regularized multiple linear 
regression (MLR) models (Yang et al., 2014). A 10-fold cross-validation was 
performed with an embedded 10-fold cross-validation on the training set to 
determine the optimal 7 parameter. We trained models that (1) encoded the 
nucleotide sequence of each of the bound sequences as binary features 
(sequence models), (2) encoded different combinations of the DNA shape 
features MG width, ProT, Roll, and HelT (shape models), and (3) combined 
nucleotide sequence and DNA shape features at the corresponding position 
(sequence+shape models). We calculated the coefficient of determination 
between the predicted and experimentally determined logarithm of relative 
binding affinities using 1 0-fold cross validation. We used all 1 4-mer sequences 
from R3 of the selection with a count of >50, aligned based on the TGAYNNAY 
core motif for heterodimers, and the logarithm of the relative binding affinity as 
response variable. Af?^s were defined as described in the text. See Extended 
Experimental Procedures for details and access to the source code for DNA 
shape prediction and feature mapping. 

Classification Models for Distinguishing Binding Specificities 

To classify Hox binding specificities, we aligned 14-mers selected by Exd- 
ScrWT (assigned the label +1) or Exd-AntpWT (assigned the label -1) accord- 
ing to the presence of a single core motif TGAYNNAY. We trained classification 
models using L2-regularized MLR and used the resulting models to classify the 
top 50% aligned binding sites preferred by the mutants. The models were eval- 
uated based on this training data using L2-regularized MLR and 10-fold cross- 
validation, and area under the receiver-operating characteristic curve (AUC) 
was used as performance measure. See Extended Experimental Procedures 
for details. 
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SUMMARY 

Research over the past decade has suggested 
important roles for pseudogenes in physiology and 
disease. In vitro experiments demonstrated that 
pseudogenes contribute to cell transformation 
through several mechanisms. However, in vivo evi- 
dence for a causal role of pseudogenes in cancer 
development is lacking. Here, we report that mice en- 
gineered to overexpress either the full-length murine 
pseudogene Braf-rs1 or its pseudo “CDS” or 
“3' UTR” develop an aggressive malignancy resem- 
bling human diffuse large B cell lymphoma. We 
show that Braf-rs7 and its human ortholog, BRAFP1, 
elicit their oncogenic activity, at least in part, as 
competitive endogenous RNAs (ceRNAs) that elevate 
BRAF expression and MARK activation in vitro and 
in vivo. Notably, we find that transcriptional or 
genomic aberrations of BRAFP1 occur frequently in 
multiple human cancers, including B cell lymphomas. 
Our engineered mouse models demonstrate the 
oncogenic potential of pseudogenes and indicate 
that ceRNA-mediated microRNA sequestration may 
contribute to the development of cancer. 

INTRODUCTION 

Over the past few years, remarkable progress has been made in 
establishing long non-coding RNAs (IncRNAs) as important reg- 
ulators of various biological processes. Given their critical roles, 

CrossMark 



it is not surprising that aberrant expression and/or function of 
IncRNAs are implicated in the development of diseases such 
as cancer (Gutschner and Diederichs, 2012). 

Pseudogenes, a sub-class of IncRNA genes that developed 
from protein-coding genes but have lost the ability to produce 
proteins, have long been viewed as non-functional genomic 
relicts of evolution (Poliseno, 2012). However, the vast majority 
of pseudogenes have protein-coding parental counterparts 
with which they share high sequence homology, which 
enables pseudogenes to participate in posttranscriptional regu- 
lation of their parental genes. Mechanisms of parental gene 
regulation include the formation of endogenous siRNAs (Tam 
et al., 2008; Watanabe et al., 2008), recruitment of regulatory 
proteins by pseudogene antisense RNAs to complementary 
sites in the parental gene to modulate chromatin remodeling 
and transcription (Hawkins and Morris, 2010; Johnsson et al., 
2013), and competition for RNA-binding proteins or the transla- 
tion machinery (Bier et al., 2009; Chiefari et al., 2010; Han et al., 
2011). 

We recently proposed that the high sequence homology en- 
ables pseudogenes to compete with their parental genes for a 
shared pool of common microRNAs (miRNAs) (Poliseno et al., 
2010), thus regulating the latter’s expression as competitive 
endogenous RNA (ceRNAs) (Salmena et al., 2011). This mecha- 
nism is of particular relevance to cancer where pseudogenes are 
aberrantly expressed (Kalyana-Sundaram et al., 2012). Specif- 
ically, we demonstrated that pseudogenes of the frequently 
mutated cancer genes PTEN and KRAS function as ceRNAs 
in vitro (Poliseno et al., 2010). Moreover, we and others reported 
that mRNAs and non-coding RNAs may serve as ceRNAs that 
regulate each other through miRNA-dependent crosstalk 
(Cazalla et al., 2010; Cesana et al., 2011; Franco-Zorrilla et al.. 
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2007; Hansen et al., 201 3; Karreth et al., 201 1 ; Libri et al., 201 2; 
Marcinowski et al., 2012; Memczak et al., 2013; Sumazin et al., 
2011; Tay et al., 2011; Wang et al., 2013), suggesting that pseu- 
dogenes regulate the expression of their parental genes in the 
context of larger networks of protein-coding and non-coding 
ceRNAs. 

While sufficient data exist to demonstrate pseudogene func- 
tions in vitro, in vivo evidence for the regulatory activity of 
pseudogenes— either as ceRNAs or by any of the other above- 
mentioned mechanisms— is lacking, and their role in disease 
progression is correlative. Here, we describe a causal role for 
the BRAF pseudogene in the development of cancer. 

RESULTS 

The BRAF Pseudogene Regulates BRAF in a 
Dicerl -Dependent Manner 

The BRAF pseudogene (BRAFP1) is overexpressed in various 
tumor types (Zou et al., 2009; Kalyana-Sundaram et al., 2012), 
suggesting that it may contribute to cancer development. We 
have shown that pseudogenes are able to regulate expression 
of their parental genes through sequestration of shared miRNAs 
(Poliseno et al., 2010), and BRAFP 7 -mediated elevation of BRAF 
may promote MARK signaling and tumorigenesis. MiRNA pre- 
dictions revealed that murine Braf-rs1 (Gm18189) and B-Raf 
are targeted by 54 and 114 miRNA families, respectively, 53 of 
which they have in common. Similarly, human BRAFP1 and 
BRAF are targeted by 60 and 48 miRNA families, respectively, 
and share 40 (Figures S1A-S1D, Table SI). Thus, the BRAF 
pseudogene may operate as a ceRNA for BRAF in mice and hu- 
mans. Indeed, ectopic expression of Braf-rs1 in NIH 3T3 fibro- 
blasts and BRAFP1 in human PC9 and HeLa cancer cells 
elevated BRAF protein and ERK phosphorylation (Figures 1A 
and S1E). Importantly, B-Raf was critical for this effect, as the 
Braf-rs 7 -induced increase in pERK was negated by genetic dele- 
tion of B-Raf in B-Raf fibroblasts (Figure IB). Moreover, 
expression of the BRAF pseudogene increased proliferation of 
NIH 3T3, PC9, and HeLa cells (Figures 1 C, 1 D, and SI F). Moder- 
ate B-Raf overexpression was sufficient to increase pERK 
expression, proliferation, and anchorage-independent growth 
of NIH 3T3 fibroblasts (Figures S1G-S1I), indicating that Braf- 
rs 7 -mediated elevation of B-Raf may be sufficient for the 
observed phenotype. 

To test whether the effect of the BRAF pseudogene on BRAF 
expression and proliferation rates was dependent on miRNAs, 
we utilized cell lines lacking functional Dicerl, a ribonuclease 
critical for miRNA biogenesis and whose deficiency results in 
drastically reduced levels of mature miRNAs (Cummins et al., 
2006; Ravi et al., 2012). Ectopic expression oWraf-rs1 increased 
expression of B-Raf and pERK and elevated proliferation of 
Dicerl -proficient murine sarcoma cells, but not that of isogenic 
Dicerl knockout cells (Figures 1 E and 1 F). Similarly, overexpres- 
sion of BRAFP1 in Dicerl -proficient human HCT116 colon can- 
cer cells increased expression of BRAF and pERK and elevated 
proliferation, and these effects were abrogated in isogenic 
Dicerl mutant HCT116 cells (Figures 1G and 1H). Thus, the 
BRAF pseudogene-induced effects are dependent on BRAF 
and Dicerl. 



The BRAF Pseudogene Regulates BRAF as a 
Competitive Endogenous RNA 

The finding that the BRAF pseudogene mediates its effect 
through mature miRNAs suggests that it may function as a 
ceRNA. To test this directly, we co-expressed BRAFP1 with a 
human BRAF-3' UTR-luciferase reporter in Dicerl -proficient 
and -deficient HCT116 cells. BRAFP1 elevated the activity of 
the BRAF 3' UTR-luciferase reporter in a Dicerl -dependent 
manner (Figure 2A), further supporting the notion that the cross- 
talk is mediated by mature miRNAs. To validate this result, we 
tested several predicted shared miRNAs in 3' UTR-luciferase re- 
porter assays. Three out of ten murine miRNAs (miR-134, miR- 
543, and miR-653) significantly repressed Braf-rs1 and B-Raf 
luciferase reporters (Figure 2B), suggesting that the crosstalk 
may be mediated at least in part by these three miRNAs. 

Next, we determined the ability of Braf-rs1 to decoy the dual 
targeting miRNAs miR-134, miR-543, and miR-653 from lucif- 
erase reporters carrying miRNA response elements (MREs). 
Braf-rs1 regulated the expression of the luciferase reporters, 
especially at lower miRNA concentrations (Figure 2C). Braf- 
rs 7 -mediated sequestration of the least potent of the three 
dual targeting miRNAs, miR-543, had the most robust effect on 
luciferase reporter activity (Figure 2C). These data suggest that 
both potency and abundance of the miRNAs may be important 
determinants for ceRNA crosstalk. In addition, Braf-rs1 was 
able to sequester endogenous miR-653, miR-134, and miR- 
543 from the respective luciferase-MRE reporters, and mutation 
of the MREs in Braf-rs1 abrogated this effect (Figure 2D). 
Similarly, four out of nine human miRNAs (miR-30a, miR-182, 
miR-876, and miR-590) were able to repress BRAF- and 
BRAFP7 -luciferase reporters (Figure S2A). miR-30a, miR-182, 
and miR-876 were also efficiently sequestered from the respec- 
tive MRE-luciferase reporters by BRAFP1 , and mutation of these 
miRNA-binding sites reduced BRAFPVs activity as a miRNA 
sponge (Figure S2B). 

Generation of TRE-BPS Mice 

As Braf-rs1 regulates the expression of B-Raf and MARK 
signaling, we sought to investigate whether aberrant Braf-rs1 
expression is oncogenic in vivo. To this end, we generated a 
transgenic allele containing murine Braf-rs1 under the control 
of a doxycycline (Dox)-inducible Tet-response element (TRE) 
and targeted it to the collagen Al locus using Flp recombi- 
nase-mediated genomic integration (Beard et al., 2006) (Figures 
S2C and S2D). We isolated mouse embryonic fibroblasts (MEFs) 
from TRE-Braf-rsI (henceforth referred to as TRE-BPS) mice to 
confirm that expression of the Braf-rs1 allele regulates B-Raf. 
Infection of MEFs with a tTA-expressing retrovirus resulted in 
6- to 18-fold induction of Braf-rs1 expression (Figures 2E and 
S2E), as well as increased levels of B-Raf and pERK (Figure 2F) 
and proliferation (Figure 2G), confirming that the transgenic allele 
elicits effects similar to ectopic expression of Braf-rs1 . 

We used TRE-BPS MEFs to analyze the stoichiometry of B-Raf 
and Braf-rs1. First, we determined the absolute number of tran- 
scripts by qPCR using plasmids carrying Braf-rs1 and B-Raf as 
standards (Figure S2E). In TRE-BPS MEFs infected with a control 
retrovirus, B-Paf molecules were 13- to 26-fold more abundant 
than Braf-rs 7, while in tTA-infected cells, the B-Raf:Braf-rs1 ratio 
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Figure 1 . The BRAF Pseudogene Regulates 
BRAF in a Dicerl -Dependent Manner 

(A) Western blot demonstrating increased BRAF 
and pERK expression upon ectopic BRAF pseu- 
dogene expression in mouse (NIH 3T3, left) and 
human (PC9, right) cells. 

(B) Western blot of fibroblasts over- 

expressing Braf-rs1 or control (yellow fluorescent 
protein [YFP]) in the presence or absence of 
Adeno-Cre infection. 

(C) Increased proliferation of NIH 3T3 fibroblasts 
upon ectopic Braf-rs1 expression. 

(D) Increased proliferation of PC9 cells upon 
ectopic BRAFP1 expression. 

(E and F) Western blot (E) and proliferation assay 
(F) of Dicerl^'-^^ and Dicerl murine sarcoma 
cells overexpressing Braf-rs1. 

(G and H) Western blot (G) and proliferation assay 
(H) of Dicerl^'^ and Dicerl human HCT116 
colon cancer cells overexpressing BRAFP1 . 

Error bars represent mean + SD. *p < 0.05; **p < 
0.01 ; ***p < 0.001 . See also Figure SI . 
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Figure 2. The BRAF Pseudogene Functions as a miRNA Sponge 

(A) BRAF 3' UTR-luciferase reporter assay in Dicerl and Dicerl'^^^ HCT116 cells expressing BRAFP1 or control (YFP). 

(B) Luciferase reporter assay using the 3' UTRs of 8-f?af and Braf-rs1 to analyze repression by the indicated miRNA mimics. mlR141 serves as a negative control. 

(C) Braf-rs1 sequesters mlRNAs to regulate MRE-Luc reporter activity. HEK293T cells were co-transfected with MRE-Luc reporter constructs, the respective 
miRNA mimics, and Braf-rs1-L277 or empty control L277 plasmids. The luciferase activity relative to a Luc reporter without MRE is shown. 

(D) Luciferase activity measured in HEK293T cells co-expressing MRE-Luc reporters (Luc-653, Luc-134, or Luc-543) and wild-type or MRE mutant Braf-rs1 or 
empty vector. 

(E) qPCR showing tTA-induced Braf-rs1 expression in TRE-BPS MEFs. 

(F) Western blot for B-Raf and pERK in tTA-infected TRE-BPS MEFs. 

(G) Proliferation of TRE-BPS MEF1 shown in (F). 

Error bars represent mean ± SD. *p < 0.05; **p < 0.01 ; ***p < 0.001 . See also Figure S2. 

was between 1 .3 and 2.5 (Figure S2E). RNA-sequencing (RNA- ure S2F and data not shown). Next, we determined the number of 
seq) analysis confirmed Braf-rs1 induction and found B-Raf: molecules of miR-653, miR-1 34, and miR-543 in TRE-BPS MEFs 

Braf-rs 7 ratios in a range similar to that determined by qPCR (Fig- by qPCR using standard curves. MiRNA expression was not 
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significantly affected upon transgene induction (Figure S2G, 
Table S2). Mir-653 was expressed at extremely low levels, likely 
precluding it from Braf-rsVB-Raf ceRNA crosstalk in MEFs. 
Additional predicted miRNAs that are expressed in MEFs (T able 
S2) but were not further validated may also contribute to cross- 
talk. Hence, the stoichiometry B-Raf, transgenic Braf-rs1 , and 
some dual-targeting miRNAs fits well within the optimal cross- 
talk criteria that we have recently established (Ala et al., 2013), 
supporting the hypothesis that overexpression of Braf-rs1 in- 
creases B-Raf through its ceRNA activity. 

Braf-rsi Causes Diffuse Large B Cell Lymphoma 

To induce global overexpression of Braf-rs1 in vivo, TRE-BPS 
mice were crossed to CAG-rtTA3 mice (Premsrirut et al., 2011), 
and compound mutant animals and single mutant controls 
were placed on a Dox-containing diet at 3 weeks of age (Fig- 
ure S3A). qPCR analysis after 4 weeks of Dox administration 
confirmed Braf-rs1 overexpression in all organs tested (Fig- 
ure S3B). Following 4 months of Dox treatment, TRE-BPS; 
CAG-rtTA3 mice became moribund and had to be sacrificed 
after a median survival of 421 days (Figure 3A), while none of 
the single-mutant animals or compound mutants maintained 
on a regular diet developed similar symptoms. All moribund 
TRE-BPS; CAG-rtTA3 mice presented with splenomegaly (Fig- 
ures 3B and 3C) and enlarged lymph nodes (Figure 3K). 

Histological analysis revealed large tumor nodules involving 
the splenic white pulp (Figures 3D and 3E). Tumors consisted 
of large lymphoid cells admixed with numerous plasmablasts 
and plasma cells (Figure 3F). The mitotic rate was very high (Fig- 
ure 3F), and the proliferation rate was markedly increased 
compared to normal white pulp (Figures 3G and S3C). 

We determined the immunophenotype of the splenic tumors 
by flow cytometry when TRE-BPS; CAG-rtTA3 mice succumbed 
to the malignancy. The cell population expressing surface B220 
was decreased in spleens (Figure 3H), while Gr-1 VMac-l"^ cells 
were slightly increased and CD3^ cells were unchanged (Figures 
31 and 3J). Lymph nodes displayed more B220'^ cells, while CD3"^ 
cells were less abundant (Figures 3L and 3M). Similar results 
were obtained when calculated as fold change relative to con- 
trols (Figures S3D-S3H). By immunohistochemistry, tumor cells 
stained positively for CD45R/B220 and IgG (Figures 4A and 4C) 
and negatively for CD3 (Figure 4B). Moreover, tumors were 
negative for the germinal center marker Bcl6 (Figure 4D) and 
strongly positive for Mumi (Figure 4E), while residual germinal 
centers adjacent to the tumors were Bcl6 positive and Mumi 
negative (Figures 4D and 4E). The decrease of B220 expression 
on the surface of tumor cells reflected the marked plasmacellular 
differentiation, as shown by the abundance of IgG"^ cells. Overall, 
this phenotype was consistent with post-germinal center diffuse 
large B cell lymphoma. 

We next determined the abundance of Braf-rs1 , B-Raf, and 
miRNA molecules in spleens after short-term Dox exposure 
(10 days) and in lymphomas and control spleens after long- 
term Dox exposure. While endogenous Braf-rs1 expression 
was between 6- and 1 1 5-fold lower than B-Raf, expression of 
transgenic Braf-rs1 was comparable to B-Raf (Figures S3I- 
S3L). Expression of miR-134, miR-543, and miR-653 was not 
affected by Braf-rs1 overexpression (Figures S3M and S3N). 



Similar to MEFs, miR-653 was expressed at low levels, while 
miR-134 and miR-543 were expressed at levels that are 
amenable to ceRNA crosstalk (Figures S3M and S3N). 

Aggressive Lymphomas Are Transplantable and Depend 
on Braf-rsi Expression 

Macroscopic lymphoma nodules were commonly observed in 
the kidneys, livers, and lungs of TRE-BPS; CAG-rtTA3 mice (Fig- 
ure 4F and data not shown), and histological analysis revealed 
microscopic organ infiltration by lymphoma cells in all animals 
(Figures 4G-4I). Such tumor cells displayed a CD45R/B220'" 
and Muml'^ phenotype identical to the cells infiltrating spleens 
and lymph nodes (Figures 4J-40). Additionally, heterozygous 
loss of Pten reduced the median survival of TRE-BPS; CAG- 
rtTA3 mice to 172 days (data not shown). 

To further assess the tumorigenicity of Braf-rs 7 -induced lym- 
phomas, we analyzed their transplantation potential. NSG mice 
injected with TRE-BPS; CAG-rtTA3 spleen cells had to be sacri- 
ficed 100-150 days after transplantation due to deteriorating 
health. Moreover, NSG mice transplanted with TRE-BPS; 
CAG-rtTA3; Pten'^^~ lymphoma cells had to be sacrificed after 
80 days (data not shown). NSG recipients exhibited infiltrating 
lymphoma cells in spleens, livers, lungs, and kidneys (Figure 5A). 
These results suggest that Braf-rs 7 -induced lymphomas are 
transplantable and highly aggressive. 

We next determined whether continuous expression of Braf- 
rsi was required for tumor maintenance. TRE-BPS; CAG-rtTA3 
receiving a Dox-diet were monitored by palpation and were taken 
off Dox chow once splenomegaly became apparent. Spleen 
sizes of these animals were subsequently measured using 
high-resolution ultrasound. Notably, enlarged spleens of all 
TRE-BPS; CAG-rtTA3 mice reduced in size, while spleens of con- 
trol mice were unaffected (Figure 5B). Moreover, 40 days after 
weaning the mice off Dox chow, the histology (Figures 5C and 
5D) and Mumi expression pattern (Figures 5E and 5F) of the white 
pulp of TRE-BPS; CAG-rtTA3 spleens were comparable to con- 
trols, confirming that lymphomas had largely regressed. 

Braf-rsi Regulates B-Raf In Vivo 

To determine whether Braf-rsi functions as a ceRNA for B-Raf 
in vivo, we examined Braf-rs 7 -induced lymphomas for expres- 
sion of B-Raf and pERK. Notably, Braf-rs 7 -induced lymphomas 
displayed increased levels of B-Raf and pERK (Figures 5G, 5H, 
and S4A) compared to adjacent normal white pulp. The difference 
in B-Raf and pERK levels between tumors and normal white pulp 
in the same mouse is likely due to positive selection of B cells that 
express the highest levels of Braf-rsi, B-Raf, and pERK. 

We next analyzed whether MARK signaling is critical for the 
growth of Braf-rs 7 -induced lymphomas. To this end, we treated 
NSG mice that were transplanted with Braf-rs 7 -induced 
lymphoma cells with the MEK inhibitor GSK1 120212. Notably, 
treatment with GSK1120212 markedly impaired the ability of 
transplanted lymphomas to colonize the livers of NSG mice (Fig- 
ure 51). Moreover, Dox withdrawal reduced B-Raf and pERK 
expression in tumors, indicating that increased MARK activation 
is stimulated by continuous Braf-rsi expression (Figure S4B). 
These data suggest that Braf-rsi elicits its oncogenic effects, 
at least in part, through B-Raf and the MARK pathway. 
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Figure 3. Braf-rs1 Expression In Vivo Results in a Lymphoid Malignancy 

BPS, TRE-BPS; CAG-rtTA3 mice on Dox; control, TRE-BPS, or CAG-rtTA3 mice on Dox here and in all figures. 

(A) Survival of BPS and control mice. 

(B and C) Size (B) and weight (C) of BPS and control mouse spleens. 

(D and E) Photomicrograph of a spleen from a control (D) and BPS mouse (E). 

(F) Higher-magnification photomicrograph showing tumor cells in a BPS spleen. White arrowheads denote plasma cells, and black arrowhead highlights a mitotic 
figure. 

(G) Quantification of Ki-67 staining. 

(H-J) Flow cytometry- based quantification of splenic B220^ (H), CD3^ (I), and Gr-1 VMac-1 ^ (J) populations. 

(K) Size of control and BPS mouse lymph nodes. 

(L and M) Flow cytometry-based quantification of B220^ (L) and (M) populations in lymph nodes. 

Error bars represent mean ± SD. See also Figure S3. 
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Figure 4. Braf-rs1 Induces Diffuse Large B Cell Lymphoma 

(A) CD45R/B220 staining. Higher magnification inset shows staining of iarge iymphoma ceiis. 

(B) CDS staining. Higher-magnification inset shows positive staining of reactive T ceiis. 

(C) IgG staining. Arrowheads denote piasma ceiis. 

(D) Bci6 staining. Lymphoma ceiis are negative, and residuai germinai center is positive. 

(E) Mum1 staining. Tumor ceiis are positive, and residuai germinai center is negative. 

(F) Photograph of controi and BPS kidneys. Arrowheads denote tumor noduies. 

(G-i) H&E staining of kidney (G), iiver (H), and lung (I) sections from BPS mice. 

(J-L) CD45R/B220 immunohistochemistry of kidney (J), liver (K), and lung (L) sections from BPS mice. 
(M-0) Mum1 immunohistochemistry of kidney (M), liver (N), and lung (O) sections from BPS mice. 
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Figure 5. Lymphomas Are Transplantable, Are Addicted to Braf-rsi Expression, and Activate the MARK Pathway 

(A) Transplanted lymphoma cells infiltrating the spleen, liver, kidney, and lungs of NSG recipient mice. 

(B) Spleen size measurements after Dox withdrawal. 

(C-F) H&E staining (C and D) and Mum1 immunohistochemistry (E and F) of BPS and control mouse spleens depicted in (B) after Dox withdrawal. 

(G) Immunohistochemical staining for B-Raf of lymphoma and adjacent normal white pulp in BPS spleen. 

(H) Immunohistochemical staining for pERK of lymphoma and adjacent normal white pulp in BPS spleen. 

(I) Percentage of liver infiltration by TRE-BPS; CAG-rtTA3; Pten^^“ lymphoma cells transplanted into NSG mice in response to GSK1120212 treatment. Each 
symbol represents a liver section, and each recipient mouse is color coded. 

Error bars represent mean ± SD. ***p < 0 .001 . See also Figure S4. 



The “CDS” and “3' UTR” of Eraf-rst Possess Oncogenic 
Potential 

Based on Braf-rsVs ability to decoy miRNAs, we reasoned that 
shorter fragments of Braf-rs1 may be able to crosstalk with 



B-Raf through a subset of the shared miRNA pool. Such frag- 
ments would elicit similar phenotypes provided that the crosstalk 
remains robust. Alternatively, different portions of Braf-rs1 
may regulate distinct ceRNA networks and yield distinct, 
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Figure 6. Braf-rs1^°^ and Braf-rsl^'^'^ 
Possess Oncogenic ceRNA Activity Similar 
to Full-Length Braf-rsi 

(A and B) Weights of spleens (A) and inguinal lymph 
nodes (B) of the indicated mouse strains after 
6 months on Dox. 

(C) Survival of TRE-BPS^'^'^^ and TRE-BPS^°^ 
mice. 

(D) Table summarizing the penetrance, median 
survival, and disease onset of TRE-BPS, TRE- 
gpgS'UTR^ and TRE-BPS^°^ mice. 

(E) H&E staining of Sraf-rs induced lym- 
phoma. White arrowheads indicate plasma cells, 
and black arrowhead indicates mitotic figure. 

(F-J) Immunohistochemical staining of Sraf- 
^5y3'UTR_induced lymphoma for Ki-67 (F), CD45R/ 
B220 (G), CDS (H), Bcl6 (I), and Mum1 (J). 

Error bars represent mean ± SD. See also 
Figure S5. 
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B-Raf-unrelated phenotypes. To experimentally examine these 
possibilities, we generated two additional Dox-inducible mouse 
models overexpressing either the “CDS” or the “3' UTR” oiBraf- 
rs1 (Figures S2C and S2D). TRE-BPS^'^® and TRE-BPS®'^'^'^ 
mice were crossed to CAG-rtTA3 mice and their offspring fed a 
Dox-containing diet for 6 months. Remarkably, both TRE- 
BpsCds TRE-BPS^''^'^^ mice displayed enlarged spleens 
and lymph nodes similar to full-length TRE-BPS mice (Figures 
6Aand 6B). Braf-rs1^'^^^ overexpression resulted in splenomeg- 
aly and reduced survival (Figures 6C and 6D and S5C) similar to 
TRE-BPS mice. The histology and immunophenotype of lym- 
phomas in TRE-BPS^'*^^^ mice were similar to that of full-length 
TRE-BPS animals (Figures 6E-6J, S5A, and S5B), indicating 
that Braf-rsl^ '^^^ overexpression elicits a phenotype similar to 
full-length Braf-rs1 . TRE-BPS^^^ mice developed lymphomas 
with a reduced penetrance and aggressiveness compared to 
mice overexpressing full-length Braf-rs1 or Braf-rsl^ '^^'^ (Figures 
6C and 6D and data not shown). Similarly, infection of TRE- 
BpsCds tRE-BPS^'^"^^ MEFs with tTA-pMSCV induced 
Braf-rs1^^^ and Braf-rs1^'^^^ expression (Figure S5D), but only 
Braf-rsl^'^^'^ elicited a significant effect on B-Raf expression 



and proliferation, while the Braf-rs1^^^~ 
induced effects were negligible (Figures 
S5D-S5G). Braf-rs1^^^ and Braf-rs1^'^'^^ 
may regulate distinct ceRNA networks, 
but the finding that the severity of the 
phenotype elicited by the three Braf-rs1 
variants correlated with their ability to 
deregulate B-Raf provides compelling 
support to the notion that Braf-rs1 oper- 
ates as a proto-oncogenic ceRNA 
through B-Raf in B cells. 

BRAFP1 Is an Oncogenic ceRNA in 
Human Cancer 

Overexpression of human BRAFP1 in- 
creased BRAF and pERK levels as well 
as proliferation of human cells (Figures 
1A, 1D, 1G, and 1H), suggesting that BRAFP1 may be an onco- 
gene in human cancer. To explore this possibility further, we first 
determined whether BRAFP1 is expressed in human DLBCL. 
Interestingly, BRAFP1 expression was not found in primary hu- 
man B cells (Figures 7A and S6A) but was detected in 30% of 
human primary DLBCL and 20% of human DLBCL cell lines (Fig- 
ures 7 A and S6A). Similar observations have been made in the 
thyroid, where BRAFP1 was expressed in some tumors, but 
not in normal tissue (Zou et al., 2009). Moreover, BRAFP1 was 
expressed in melanoma, prostate cancer, and lung cancer cell 
lines (Figure S6A). 

We next interrogated The Cancer Genome Atlas’s (TCGA) cBio 
Cancer Genomics Portal for genomic abnormalities of the locus 
containing BRAFP1 . As pseudogene data are not yet included in 
TCGA, we focused our analysis on two protein-coding genes 
flanking BRAFP1: ZDHHC15 and MAGEE2 (Figure S6B). 
Notably, concurrent copy-number gains and amplification of 
ZDHHC15 and MAGEE2 were observed in numerous cancer 
types (Figure S6B). Importantly, BRAFP1 expression could be 
detected in such cancer types (Kalyana-Sundaram et al., 
2012). Thus, both transcriptional mechanisms and genomic 



Cell 161, 319-332, April 9, 2015 ©2015 Elsevier Inc. 327 








Cell 



HBRAFP1+ \Z\BRAFP1- 
DLBCL cell lines 

31 



8 



primary DLBCL 



6 



14 



25 50 75 100 

% of cases 





shBRAFPI shBRAF 



pLKO.1 1 2 1 



Ly18 



ShBRAFPI ShBRAF 



ShBRAFPI ShBRAF 



ShBRAFPI ShBRAF 



pLKO.1 1 2 1 



pLKO.1 1 2 1 



pLKO.1 1 2 1 




HI 299 



PC9 



Ly1 






vec BRAFP1 



M 




pERK 

tERK 

HSP90 



Lyl 





Days 




control BRAFP1 



Days 



shared miRNAs 






BRAF 



other ceRNA partners? 



MARK 

Proliferation 



BRAFps 



other mechanisms? 



i 



Cancer 



Figure 7. BRAFP1 in Human Cancer 

(A) Percentage of primary human B cells, primary human DLBCL, and human DLBCL cell lines expressing BRAFP1 as determined by qPCR analysis. 
(B and C) Positive correlation of BRAFP1 and BRAF expression in human DLBCL primary tumors (B) and cell lines (C). 

(D-G) Western blot for BRAF and pERK in OCI-Ly18 (D), HI 299 (E), PC9 (F), and OCI-Lyl (G) cells in response to BRAFP1 silencing. 

(H-K) Proliferation of OCI-Lyl 8 (H), HI 299 (I), PC9 (J), and OCI-Lyl (K) cells in response to BRAFP1 silencing. 

(L) Western blot for BRAF and pERK in OCI-Lyl cells overexpressing BRAFP1 . 



(legend continued on next page) 
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aberrations may lead to abnormal BRAFP1 expression in human 
cancer. 

Our experiments in human cell lines indicate that BRAFP1 may 
operate as a ceRNA to regulate BRAF expression. Accordingly, 
analysis of RNA sequencing data revealed that BRAFP1 and 
BRAF expression were positively correlated in primary human 
DLBCL tumors and DLBCL cell lines (Figure 7B and 70). We 
also analyzed whether the expression of dual-targeting miRNAs 
correlates with BRAF and/or BRAFP1 expression. While miR- 
590 expression negatively correlated with BRAFP1 levels, miR- 
30a, miR-182, and miR-876 showed no correlation (Figure S6C). 
Thus, similar to our observations in TRE-BPS MEFs, expression 
of BRAFP1 and BRAF may not affect miRNA abundance in 
human DLBCL. 

To functionally validate the oncogenic function of BRAFP1 in 
human cancer, we designed shRNAs to specifically silence 
expression of endogenous BRAFP1 (Figure S6H). Knockdown 
of BRAFP1 in OCI-Ly18 DLBCL cells and H1299 and PC9 lung 
cancer cells reduced the expression of BRAF and pERK (Figures 
7D-7F and S6I-S6K). BRAFP1 silencing moderately reduced 
BRAF mRNA levels in OCI-Ly18 and PC9 cells, but not in 
H1299 cells, suggesting that the mechanism of miRNA-medi- 
ated regulation of BRAF varies between cell lines. Importantly, 
the BRAFP1 hairpins had no effect on BRAF and pERK expres- 
sion in OCI-Ly1 DLBCL cells that do not express endogenous 
BRAFP1 (Figure 7G). Moreover, BRAFP1 silencing reduced pro- 
liferation of CCI-Ly1 8, H1 299, and PC9 cells, but not of CCI-Ly1 
cells (Figures 7H-7K). Remarkably, silencing of endogenous 
BRAFP1 elicited a significant effect on BRAF expression in 
CCI-Ly18, H1299, and PC9 cells even though it is ~15- to 
~30-fold less abundant than BRAF (Figures S6D and S6E). 
Intriguingly, BRAFpl was turned over significantly faster than 
BRAF (Figure S6F), suggesting that the relatively low expression 
levels of BRAFP1 may be due to its short half-life. We also deter- 
mined the abundance of miR-30a, miR-1 82, and miR-876 in CCI- 
Ly18, H1299, and PC9 cells and found that their expression 
levels were in the same range as those of BRAFP1 and BRAF 
(Figures S6G). 

Cverexpression of BRAFP1 in three human DLBCL cell 
lines lacking endogenous BRAFP1 expression, SU-DHL-4, 
Karpas422, and CCI-Ly1 (Figures S6A and S6L), resulted in 
elevated BRAF and pERK levels (Figures 7L and S6M). More- 
over, BRAFP1 overexpression increased proliferation of all three 
DLBCL cell lines (Figures 7M, S6N, and S6C) and resulted in 
increased growth of xenotransplanted CCI-Ly1 cells in the 
bone marrow of NSG recipients (Figure 7N). These data suggest 
that BRAFP1 has oncogenic properties in human cancer. 

DISCUSSION 

We investigated whether pseudogenes exert critical functions in 
the context of a whole organism and whether their perturbation 
contributes to the development of disease. We focused on the 



BRAF pseudogene, as it exists in humans and mice and is de- 
regulated in cancer (Kalyana-Sundaram et al., 2012; Zou et al., 
2009). Cur study establishes the BRAF pseudogene as a potent 
proto-oncogene that can elicit a phenotype resembling human 
diffuse large B cell lymphoma. Remarkably, no additional 
engineered mutations were required to drive this phenotype, 
and lymphomas completely regressed upon Dox withdrawal, 
emphasizing the oncogenic potential of the BRAF pseudogene. 
While it is possible that the BRAF pseudogene elicits its effects 
through more than one mechanism or pathway, the fact that 
both the CDS and the 3' UTR of Braf-rs1 displayed a similar 
phenotype to full-length Braf-rs1 , albeit with different severity, 
supports the notion that Braf-rs1 functions as a ceRNA to regu- 
late B-Raf in vivo (Figure 7C). Whether the oncogenic activity of 
Braf-rs1 also requires additional ceRNA targets or non-ceRNA- 
related mechanisms will be the focus of future studies. 

Several groups, including ours, have generated mathematical 
models to quantitatively assess the response of a ceRNA 
network to perturbations (Ala et al., 201 3; Bosia et al., 201 3; Fig- 
liuzzi et al., 2013). More recently, such models were used in 
conjunction with miRNA predictions, RNA sequencing, and 
target site occupancy analyses to more accurately characterize 
miRNA competition (Bosson et al., 2014; Denzier et al., 2014; 
Jens and Rajewsky, 2014). Intriguingly, these studies yielded 
disparate conclusions. It was proposed that ceRNA crosstalk 
is unlikely to occur upon physiological changes of ceRNA 
expression based on these models’ estimates of the number of 
additional target sites required to achieve significant expression 
changes of other targets (Denzier et al., 2014; Jens and Rajew- 
sky, 2014). By contrast, using Argonauts iCLIP and RNA-seq, 
Sharp and colleagues determined that a relatively low number 
of additional target sites could elicit ceRNA crosstalk when the 
number of miRNA molecules and high-affinity target sites 
approaches equimolarity (Bosson et al., 2014). Interestingly, 
BRAFP1 is several-fold less abundant than BRAF, yet its 
silencing significantly diminished BRAF expression levels, 
MARK signaling, and proliferation. BRAF and its pseudogene 
harbor high-affinity sites for the murine and human miRNAs 
that we validated as potential mediators of the ceRNA crosstalk 
(miRs-134, -543, and -653 and miRs-30a, -182, -876, respec- 
tively). Notably, the levels of these miRNAs in mouse spleens 
and lymphomas as well as human cancer cell lines are amenable 
to miRNA competition in accordance with the model proposed 
by Bosson et al. Thus, a ceRNA effect of BRAFP1 that is solely 
based on miRNA competition may be compatible with this 
model. 

Importantly, the studies by the groups of Sharp, Stoffel, and 
Rajewsky focused on ceRNA regulation that is mediated by a 
single miRNA. However, ceRNA pairs in general, and gene/pseu- 
dogene pairs in particular, share numerous miRNAs. This in- 
creases the likelihood of shared miRNAs being present at cross- 
talk-favoring levels, and we have shown that ceRNA crosstalk is 
enhanced when it is mediated by more miRNAs (Ala et al., 201 3). 



(M) Proliferation of OCI-Ly1 cells. 

(N) Percentage of human CD19^ transplanted OCI-Ly1 cells in bone marrow of NSG recipients. 

(O) Model depicting the proposed oncogenic action of the BRAF pseudogene. 

Error bars represent mean ± SD. *p < 0.05; **p < 0.01 ; ***p < 0.001 . See also Figure S6. 
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As discussed by Jens and Rajewsky, several factors that may 
influence ceRNA crosstalk are neglected in current mathemat- 
ical models. For instance, subcellular co-localization of mlRNAs 
and competing targets may result in local concentrations that 
favor ceRNA crosstalk. In addition, target degradation may 
trap miRNAs in P bodies or other sites of RNA decay, thus ampli- 
fying the ceRNA regulation by removing miRNAs from the avail- 
able pool. Intriguingly, BRAFP1 is degraded significantly faster 
than BRAF (Figure S6F); however, whether this influences the 
ceRNA activity of BRAFP1 remains to be determined. Future 
improvements to both quantitative measurements and mathe- 
matical models will undoubtedly provide a better understanding 
of the molecular conditions required for ceRNA crosstalk. How- 
ever, it should be noted that ceRNA crosstalk can be predicted 
solely based on the MRE overlap of transcripts (Chiu et al., 
2014; Karreth et al., 2011; Sumazin et al., 2011; Tay et al., 
2011), suggesting that miRNA competition is indeed the central 
component of ceRNA crosstalk. 

Human hematopoietic malignancies are associated with 
“overdosage” of the X chromosome, which harbors the BRAF 
pseudogene locus. This can occur through XIST deletion and X 
chromosome duplication in women with myeloid cancers, and 
extra X chromosomes have been noted in a variety of hemato- 
poietic cancers of both sexes (Dewald et al., 1989; Dierlamm 
et al., 1995; Heinonen et al., 1999; Paulsson et al., 2010; 
Rack et al., 1994; Yamamoto et al., 2002), including DLBCL 
(Bea et al.,2005; Monniet al., 1996; Morin et al., 2013). Our anal- 
ysis revealed that a variety of human cancers harbor copy-num- 
ber gains and amplifications of the locus containing BRAFP1 . It is 
therefore tempting to speculate that increased X dosage and the 
potentially associated overexpression of BRAFP1 contribute to 
the development and/or progression of cancer cases harboring 
more than one active copy of the X chromosome. Moreover, 
elevated expression of BRAFP1 has been observed in cancers 
other than DLBCL (Kalyana-Sundaram et al., 2012; Zou et al., 
2009), and transcriptional deregulation may thus be another 
means to deregulate BRAFP1 expression. Whether BRAFP1 
has oncogenic potential in other organs such as the thyroid re- 
mains to be determined through the use of tissue-specific over- 
expression of the BRAF pseudogene. 

Interestingly, several observations suggest that the BRAF 
pseudogenes evolved independently in mice and humans. 
First, they reside in non-syngeneic locations— on chromosome 
10 in mice and on the X chromosome in humans. Second, 
the 3' UTR of the BRAF gene is not conserved between mice 
and humans; importantly, however, the BRAF pseudogene 
3' UTRs display high sequence homology to their parental 
counterparts in the respective species. Third, murine Braf-rs1 
arose from an alternative B-Raf splice form that is specific to 
mice (Karreth et al., 2009). The likely parallel yet converging 
evolution of BRAFP1 and Braf-rs1 and the fact that the 
gene-pseudogene crosstalk is mediated by different miRNAs 
in the two species suggest that their functions may be 
conserved. Indeed, the frequent BRAFP1 copy-number gains 
and transcriptional activation of BRAFP1 in human cancers 
as well as our silencing and overexpression experiments indi- 
cate that our findings in the mouse are of relevance to human 
disease. 



It was recently proposed that human BRAFP1 encodes a pep- 
tide with the ability to activate the MARK pathway (Zou et al., 
2009). We neither detected any peptide translation by the mouse 
or human BRAF pseudogenes nor could we detect robust asso- 
ciation of Braf-rs1 with actively translating ribosomes (data not 
shown). These findings suggest that Braf-rs1 is not translated 
into an oncogenic peptide but, rather, exerts its function as a 
RNA transcript. This is further supported by the finding that 
TRE-BPS^'^^^ mice display a more severe phenotype compared 
to TRE-BPS*^*^^ mice, which suggests that the effects oiBraf-rs1 
on B-Raf are primarily mediated through its 3' UTR. The BRAFP1 
ORF predicted by Zou et al., however, localizes to the CDS 
portion of the pseudogene. 

Pseudogenes were considered genomic junk for decades, but 
their retention during evolution argues that they may possess 
important functions and that their deregulation could contribute 
to the development of disease. Indeed, several lines of evidence 
have associated pseudogenes with cellular transformation (Poli- 
seno, 2012). Our study shows that aberrant expression of a 
pseudogene causes cancer, thus vastly expanding the number 
of genes that may be involved in this disease. Moreover, our 
work emphasizes the functional importance of the non-coding 
dimension of the transcriptome and should stimulate further 
studies of the role of pseudogenes in the development of 
disease. 

EXPERIMENTAL PROCEDURES 
Flow Cytometry 

Mice were euthanized and singie-ceii suspensions from spieens and iymph 
nodes were prepared by passing organs through 100 ^im ceii strainers in 2% 
FBS/PBS, centrifuged and re-suspended in 1-2 mi ACK red ceii iysis buffer 
(GiBCO). Red biood ceiis were iysed on ice for 1 min. Ceii suspensions were 
then washed in 2% FBS/PBS, centrifuged and re-suspended in 1 mi 2% 
FBS/PBS. For hematopoietic iineage anaiysis, we used monocionai antibodies 
specific for the foiiowing: CD3e-PE (145-2C11), B220-FiTC (RA3-6B2), Gr-1- 
APC (RB6-8C5), and CD11b-PE/Cy7 (Ml/70). Aii antibodies were from 
eBioscience. To assess ceii viabiiity, ceiis were incubated with DAPi prior to 
FACS anaiysis. Aii staining mixtures were anaiyzed on a BD LSR ii fiow cytom- 
eter (Becton Dickinson). Resuiting profiies were further processed and 
anaiyzed using the FiowJo 8.7 software. For foid change quantifications, 
both mutant and controi ceii popuiations were normaiized to the average of 
the controis. At ieast five mice from different iitters were used for aii fiow 
cytometry experiments. 

Tissue Fixation, H&E, and IHC 

Tissues were fixed in 4% paraformaidehyde overnight and embedded in 
paraffin according to standard procedures. 5 lam sections were either stained 
with hematoxyiin & eosin or with the foiiowing antibodies: CD45R/B220 
(ab64100. Abeam), CD3 (ab5690. Abeam), Ki-67 (RM-91 06-SI, Thermo 
Scientific), igG (BA2000, Vector), BRAF (sc-9002, Santa Cruz), pERK (4373, 
Ceii Signaiing), Bci-6 (5650, Ceii Signaiing), and Mumi (sc-6059, Santa 
Cruz). Organs from at ieast five mice from different iitters were used for aii 
stainings. 

Cell Culture 

FiCT 116 and HeLa were from ATCC, Dicer mutant HCT 116 ceiis were provided 
by B. Vogeistein, and DiceF'-^^ and DiceP'^ mouse sarcoma ceiis were pro- 
vided by P. Sharp and were cuitured in DMEM containing 10% FCS and 
2 mM L-giutamine. PC9, HI 299, H441 , and H2009 (aii provided by L. Cantiey), 
OCi-Ly8, OCi-Ly3, RCK8, and Vai were grown in RPMi-1640 containing 10% 
FCS and 2 mM L-giutamine. SU-DHL-4, SU-DHL-8, Karpas422, OCi-Ly7, 
Toiedo, OCi-Lyl, and OCi-Ly18 ceiis were grown as previousiy described 
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(Chapuy et al., 2013). Cells were regularly tested with MycoAlert (Lonza) to 
ascertain that cells were not infected with mycoplasma. 

Plasmids, Transfection, and Virus Infection 

Human BRAFP1 was cloned into pLenti-CMV-GFP-Puro (Addgene 25873) 
and pCDNA3, and mouse Braf-rs1 was cloned into pCCL.sin.PPT.hPGK. 
GFP.Wpre (L277, L. Naldini) or pCDNA3-neo. pMSCV-tTA (Addgene #18783) 
was used to induce Braf-rs1 expression in TRE-BPS MEFs. Lipofectamine 
2000 was used for plasmid transfection. Lentivirus or retrovirus was produced 
in HEK293T LentiX cells (Clontech) co-transfected with VSVG, pMDL, and Rev 
or Eco helper plasmids, respectively. Viral supernatants were filtered and cells 
infected in the presence of 5 |ag/ml polybrene. 

Proiiferation Assays 

For proliferation assays, 2x10"^ cells were plated in four 12-well plates in trip- 
licates. Every day, one plate was fixed with 4% paraformaldehyde and stained 
with Crystal Violet. The dye was extracted with 10% acetic acid and its absor- 
bance determined at OD595. For suspension cells, 1x10"^ cells were plated in 
triplicates in round-bottom 96-well plates and counted every day for 5 days. 

Luciferase Assays 

HCT116 cells were transfected with 150 ng of psiCHECK2 vector or 
psiCHECK2-humanBRAF 3' UTR and 1 mg human BRAFP1 constructs using 
Lipofectamine 2000. To validate miRNA targeting, 3' UTRs of murine and hu- 
man gene and pseudogene were cloned into psiCHECK2. 5x10^ HEK293T 
cells were transfected in 48-well plates with 20 ng of psiCHECK2 reporter 
and 100 nM miRNA mimic (QIAGEN). To test the ceRNA activity of the BRAF 
pseudogenes, 5 x 10"^ HEK293T cells were transfected in 48-well plates 
with 20 ng of psiCHECK2 reporter and 250 ng of murine Braf-rsi -L277 vector 
or human BRFAP1-pCDNA3 and 1-2 nM miRNA mimic. In all transfections, 
firefly luciferase activity was used as a normalization control for transfection 
efficiency. 48 hr after transfection, luciferase activities were measured consec- 
utively with the dual luciferase reporter system (Promega). 

Western Blot 

Cells were lysed in RIPA buffer containing HALT protease and phosphatase in- 
hibitors (Sigma). 20 |ig total protein were separated on 4%-1 2% Bis-T ris acryl- 
amide NuPAGE gradient gels in MOPS SDS buffer (Invitrogen). The following 
antibodies were used: HSP90 (610419, BD), BRAF (sc5284, Santa Cruz), 
pERK (9101, Cell Signaling), and tERK (9102, Cell Signaling). Secondary 
HRP-tagged antibodies and ECL detection reagent were from Amersham. 
Image J software was used for quantification. 
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SUMMARY 

NF-kB is a key transcriptional regulator involved in 
inflammation and cell proliferation, survival, and 
transformation. Several key steps in its activation 
are mediated by the ubiquitin (Ub) system. One un- 
characterized step is limited proteasomal processing 
of the NF-kB 1 precursor p105 to the p50 active 
subunit. Here, we identify KPC1 as the Ub ligase 
(E3) that binds to the ankyrin repeats domain of 
p105, ubiquitinates it, and mediates its processing 
both under basal conditions and following signaling. 
Overexpression of KPC1 inhibits tumor growth likely 
mediated via excessive generation of p50. Also, 
overabundance of p50 downregulates p65, suggest- 
ing that a p50-p50 homodimer may modulate 
transcription in place of the tumorigenic p50-p65. 
Transcript analysis reveals increased expression of 
genes associated with tumor-suppressive signals. 
Overall, KPC1 regulation of NF-kB 1 processing 
appears to constitute an important balancing step 
among the stimulatory and inhibitory activities of 
the transcription factor in cell growth control. 

INTRODUCTION 

The NF-kB family of transcription factors is involved in regulation 
of a variety of genes that control the immune and inflammatory 

CrossMark 



response, cell survival and death, proliferation, and differentia- 
tion. Recently— 150 years after Rudolf Virchow discovered the 
infiltration of tumors with leukocytes and proposed a linkage be- 
tween chronic inflammation and malignant transformation — it 
has been shown that the mechanism(s) that underlies this linkage 
is mediated largely by the NF-kB family of transcription factors 
(Ben-Neriah and Karin, 2011; DiDonato et al., 2012). NF-kB is 
overexpressed in numerous tumors. It upregulates expression 
of anti-apoptotic genes such as lAPs, cell-cycle promoters, 
and growth factors and their receptors (DiDonato et al., 2012). 
Nevertheless, in some cases NF-kB was shown to display strong 
tumor-suppressive characteristics (Perkins, 2012; Pikarsky and 
Ben-Neriah, 2006). For example, it is involved in regulation of 
activation-induced apoptosis of T lymphocytes (Ivanov et al., 
1997) and in inducing cell-cycle arrest and cell death caused 
by repression of Bcl2, XIAP, BcI-Xl, Cyclin D1 , and c-Myc that 
occurs after cell damage. The arrest and death are mediated 
by p52 dimers (Barre et al., 2010; Barre and Perkins, 2007). 
Also, it was shown that NF-icB1“^“ cells accumulate alkylator- 
induced mutations, and NF-kB 1“^“ mice develop more lym- 
phomas following alkylating agent-induced DNA damage, 
again suggesting that NF-kBI can act as a tumor suppressor 
(Voce et al.,2014). 

The family members are mostly heterodimers where one of the 
subunits— p52 or p50— is the product of limited, ubiquitin- and 
proteasome-mediated processing of a longer (and inactive) 
precursor, pi 00 or pi 05, respectively (Betts and Nabel, 1996; 
Fan and Maniatis, 1 991 ; Palombella et al., 1 994). The other sub- 
unit is typically a member of the Rel family of proteins (RelA-p65, 
RelB, or c-Rel). At times, p50 and p52 can generate homodimers 
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that cannot act as transcriptional activators since they lack a 
transactivation domain present in the Rel proteins. In unstimu- 
lated cells, the NF-kB dimers are sequestered in the cytosol 
attached to ankyrin repeats (ARs) of kB inhibitory proteins 
(kB, Bcl3, p1 00, and p1 05). A broad array of extracellular signals 
stimulate degradation of the kB proteins, resulting in transloca- 
tion of the dimers to the nucleus where they initiate different tran- 
scriptional programs (Rahman and McFadden, 2011). 

Proteasomal processing of pi 05 occurs under both basal con- 
ditions and following stimulation and requires prior ubiquitination 
(Cohen et al., 2004; MacKichan et al., 1996). One element that 
was shown to be important in the processing is a long Gly-Ala 
repeat in the middle of pi 05 that may serve as a proteasomal 
“stop signal” (Lin and Ghosh, 1996). In addition to processing, 
pi 05 can also undergo complete degradation, releasing NF-kB 
dimers anchored to its C-terminal ARs domain. Following stimu- 
lation, pi 05 is phosphorylated on serine residues 927 and 932 
by kB kinase (IKKp) (Salmeron et al., 2001). This modification 
recruits the beta-Transducin Repeat Containing Protein (pTrCP) 
E3 (Orian et al., 2000), resulting in complete degradation of the 
molecule (Heissmeyer et al., 2001). The ligase(s) involved in 
processing of pi 05 under basal conditions as well as following 
stimulation has remained elusive. 

In the present study, we identified KIP1 ubiquitination-promot- 
ing complex (KPC) as the Ub ligase that is involved in both basal 
and signal-induced processing of pi 05. KPC is a heterodimer 
made of KPC1 (RNF123) and KPC2 (UBAC1). It was shown to 
degrade the cyclin-dependent kinase inhibitor p27Kip1 in the 
G1 phase of the cell cycle (Kamura et al., 2004). KPC1 is a 
RING-finger protein that serves as the ligase. KPC2 interacts 
with ubiquitinated proteins and with the proteasome via its two 
Ub-associated domains and a Ub-like domain, acting as a shut- 
tle that promotes the degradation of p27Kip1 . It was also shown 
to stabilize KPC1 (Flara et al., 2005). 

RESULTS 

Identification of KPC1 as the pi 05 Ub Ligase 

One of the still missing links in the Ub-mediated activation 
pathway of NF-kB is the identity of the ligase that ubiquitinates 
pi 05, resulting in its proteasomal processing to the p50 active 
subunit. To identify the ligase, we sequentially fractionated rabbit 
reticulocyte lysate using different chromatographic principles 
(Figure lAi). Each fraction along the different steps was moni- 
tored for E3 activity in a cell-free reconstituted conjugation 
assay containing in vitro translated ^®S-labeled pi 05 as a sub- 
strate (Figure 1 Aii). To avoid ubiquitination by the pTrCP ligase, 
we used p105S927A mutant that cannot be phosphorylated 
by IKKp and therefore cannot bind this E3. Employing mass 
spectrometric analysis, peptides derived from the KPC Ub ligase 
were identified in active fractions along the three last chromato- 
graphic steps. In the last step of purification (heparin), we identi- 
fied 58 KPC1 peptides and seven KPC2 peptides covering 
43.21% and 19.8% of the open reading frames, respectively 
(Figure 1B). Because of lack sequence information on rabbit 
KPC2, we used the sequence of the mouse protein to demon- 
strate the coverage map. The changes between the two species 
are negligible (but shown). 



To test directly the role of KPC in pi 05 ubiquitination and 
processing, we established a cell-free conjugation assay using 
labeled pi 05 as a substrate and purified KPC1 or its catalytically 
inactive species (mutated in the RING domain) KPC1I1256A as 
the ligase. The wild-type (WT) ligase catalyzed conjugation of 
pi 05, whereas the inactive ligase did not (Figure 2A). It appears 
that KPC1 activity is specific to pi 05, as it scarcely modifies 
pi 00 that is highly homologous to pi 05 and also undergoes 
limited proteasomal processing, most probably by a different 
ligase (Figure SI A). 

To demonstrate the ability of KPC1 to modify pi 05 in cells, we 
overexpressed Flag-pi 05 along with HA-Ub in HEK293 cells, in 
which KPC1 was either silenced (Figure 2B, lane 1), or over- 
expressed (Figure 2B, lanes 2 and 3). Immunoprecipitation of 
pi 05 revealed that it is sparsely ubiquitinated in the absence of 
the ligase, and ubiquitination is increased significantly following 
overexpression of KPC1 (Figure 2Bi; immunoprecipitation [IP], 
compare lanes 1 and 2). Furthermore, we found that pi 05 
binds to KPC1 and co-immunoprecipitates with it (Figure 2Biii; 
IP, lane 2). In addition, we demonstrated that endogenous 
KPC1 interacts with endogenous pi 05 (Figure SIB). 

KPC1 Promotes Basal and Signal-Induced Processing 
of pi 05 

To demonstrate the involvement of KPC1 in pi 05 processing, we 
silenced its expression in cells using small interfering RNA 
(siRNA). As can be seen in Figure 2C, the silencing of KPC1 
decreased the amount of p50 generated from pi 05. In a different 
experiment, we expressed in HEK293 cells FLAG-p105 along 
with Myc-KPCI or Myc-KPCI I1256A. Less p50 was generated 
in the presence of the KPC1 mutant (Figure SIC). 

As noted, processing of pi 05 occurs also following stimula- 
tion. It was interesting to study whether KPC1 can promote 
pi 05 processing under these conditions as well. Therefore, we 
tested the generation of p50 from pi 05 following expression of 
constitutively active IKKp (IKKpS176,180E) in the presence 
(endogenous) or absence (silenced) of KPC1. As expected, the 
stimulation increased the processing of pi 05 (compare Figure 2D 
to Figure 2C; control siRNA). Silencing of KPC1 significantly 
decreased the generation of p50 following stimulation, strongly 
suggesting a role for KPC1 in signal-induced processing (Fig- 
ure 2D). It is known that under the influence of the kinase, the 
precursor was not only processed but also degraded to a sig- 
nificant extent (compare Figure 2D to Figure 2C and note in 
particular the decreasing amount of pi 05 + p50 remained along 
time following stimulation). It should be noted that the degrada- 
tion rate of pi 05 following stimulation was significantly higher 
in cells that lack KPC1 (Figure 2D). It is possible that the process- 
ing of pi 05 mediated by KPC1 and its degradation mediated 
by PTrCP occur in parallel. When one process is inactivated, 
the other becomes dominant. The influence of KPC1 on signal 
induced-processing of pi 05 appears to be specific, as its 
silencing does not affect the processing of pi 00 following 
NF-icB-inducing kinase (NIK) expression (Figure SID). 

In all these experiments, we used exogenously expressed 
pi 05. To demonstrate the effect on endogenous pi 05, we 
used the human haploid cell line MAPI in which the single allele 
of KPC1 or KPC2 were knocked out using the Crispr-CAS 
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Figure 1. Purification and Identification of the p105 Ub Ligase 

(A) (i) Scheme of the chromatographic resolution of Fraction II monitoring the E3 ligating activity toward pi 05. Numbers represent salt concentrations (M) or 
molecular weight (kDa) at which the ligating activity was eluted from the respective columns. Fr II, Fraction II. (ii) E3 conjugating activity profile along the fractions 
resolved by the Superdex 200 gel filtration column. In vitro translated and ^^S-labeled p105S927A was ubiquitinated in a reconstituted cell-free system in the 
presence of the resolved fractions. 

(B) (i) Peptide coverage map of rabbit KPC1 . The peptides were identified through mass spectrometric analysis of the E3-containing fractions resolved by the last, 
Heparin-based column, (ii) Peptide coverage map of mouse KPC2. The peptides were identified through mass spectrometric analysis of the E3-containing 
fractions resolved by the last. Heparin-based column. Residues marked in bold and italics denote differences in sequence between mouse and rabbit. 



technology. Elimination of KPC1 or KPC2 (that stabilizes KPC1 
[Hara et al., 2005]; note that removal of KPC2 results in a signif- 
icant decrease in the level of KPC1; Figure 2E) decreased 
the generation of p50 both in the presence or absence of TNFa 



(Figure 2E). In contrast, the level of p65 was not affected. The 
finding that p50 is still present, albeit in a decreased level, in 
the KPC1 KO cells, may be due to the activity of another, yet 
to be identified ligase, and/or to co-translational processing of 
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the nascent peptide that occurs before completion of the p105 
precursor synthesis (Lin et al., 1998). It should also be noted 
that the effect of KPC1 on p50 generation is significantly more 
pronounced in tumors growing in mice than in cultured cells 
(see below). 

Our finding that KPC1 mediates processing under both basal 
and stimulated conditions prompted us to dissect the mecha- 
nism involved. We monitored the interaction between KPC1 
and pi 05 under basal and stimulated conditions and found 
that expression of constitutively active IKKp results in increased 
interaction between the two as assayed by co-immunoprecipita- 
tion (Figures 2F and S1E). The finding that the interaction of 
p105S927A with KPC1 is not affected by IKKp (Figure 2Fi, lanes 
4 and 5) attests to the specificity of the effect of IKKp in phos- 
phorylating a specific Ser residue (927) in pi 05. As expected, 
we found that ubiquitination of phosphorylated pi 05 by KPC1 
is stronger compared to that of the non-phosphorylated species 
(Figure SI F). 

To further confirm that KPC1 interacts more efficiently with 
phosphorylated pi 05, we designed an experiment in which we 
competed on the binding of pi 05 to the ligase with a synthetic 
phosphorylated peptide derived from the pi 05 IKKp-phos- 
phorylation site. The phosphorylated peptide inhibited ubiquiti- 
nation of pi 05 by KPC1 to a larger extent compared with its 



non-phosphorylated species, both in a crude system and in a 
system made of purified components (Figures 2G and S1G, 
respectively). 

Role of KPC2 in KPC1 -Mediated pi 05 Ubiquitination 
and Processing 

At that point, it was important to study the role of KPC2, the 
partner of KPC1 in the heterodimeric ligase complex, in pi 05 
modification and processing. We noted that its addition to a 
reconstituted cell-free system decreases significantly the ubiqui- 
tination of pi 05 by KPC1 (Figure 2H). This was true also when 
pi 05 was purified by a specific antibody, ruling out a possible 
effect of other components present in the mixture in which the 
labeled pi 05 was translated (Figure S2Ai). To rule out that the 
reduced ubiquitination of pi 05 in the presence of KPC2 is due 
to a possible deubiquitinating activity of the protein, we added 
it to the cell-free ubiquitination system after KPC1 , when most 
of the ubiquitination reaction was completed. It had no effect 
on the conjugates pattern (Figure S2Aii). The interference of 
KPC2 in chain formation appears to be specific to KPC1 and 
pi 05, as it did not affect the ligase activity of E6-AP toward 
RINGIB'^^^ (Zaaroor-Regev et al., 2010) (Figure S2B). 

Importantly, in correlation with the suppressive effect of KPC2 
on KPC1 -mediated ubiquitination of pi 05, silencing of KPC2 



Figure 2. pi 05 Is a Substrate of KPC1 in a Cell-free System and in Cells, Both under Basal Conditions and following Signaling 

(A) Ubiquitination of in vitro transiated and ^^S-iabeied p105 by Fraction ii and purified KPC1-FI_AG-TEV-6xFliS or KPC1 i1256A-FI_AG-TEV-6xFIIS in a recon- 
stituted ceii-free system. Fr ii, Fraction ii. 

(B) KPC1 ubiquitinates p105 in ceiis. FIEK293 ceiis that were transfected with siRNA to siience KPC1 (iane 1) or with controi siRNA (ianes 2 and 3), were aiso 
transfected with cDNAs coding for FLAG-p105 (ianes 1 and 2), HA-Ub (ianes 1-3), and Myc-KPCI (ianes 2 and 3). FLAG-p105 and its conjugates were 
immunoprecipitated from the ceii iysates using immobiiized anti-FI_AG (iP; ianes 1 -3), resoived via SDS-PAGE, and visuaiized using anti-HA (Bi) or anti-FLAG (Bii). 
KPC1 was visuaiized using a specific antibody to the protein (Biii). Ten percent of totai ceii iysates (TCL; ianes 1-3) were anaiyzed for expression of FLAG-p105, 
HA-Ub or Myc-KPCI, using anti-HA (Bi), anti-FLAG (Bii), or anti-KPCI (Biii), respectiveiy. iP, immunoprecipitation; WB, western biot. 

(C) Siiencing of KPC1 affects basai processing of pi 05. HEK293 ceiis were transfected with controi siRNA (ianes 1-3) or siRNA to siience KPC1 (ianes 4-6). After 
24 hr, ceiis were transfected with cDNAs coding for FI_AG-p1 05. Processing of pi 05 was caicuiated as the ratio between the amount p50 at the specified time and 
the sum of p50 + pi 05 at time zero (in order to disregard degradation of pi 05 in our caicuiations), muitipiied by 100%. The amount of p50 + pi 05 remained 
(reflecting degradation along time) was calculated as the sum of p50 + pi 05 measured at the relevant time point, divided by the sum of p50 + pi 05 at time zero, 
multiplied by 100%. 

(D) Silencing of KPC1 inhibits signal-induced processing of pi 05. HEK293 cells were transfected with control siRNA (lanes 1 -3) or siRNA that targets KPC1 (lanes 
4-6). After 24 hr, cells were transfected with cDNAs coding for FLAG-p105 and IKK(3S176,180E. Twenty-four hours after transfection (in the experiments depicted 
under C and D), cycloheximide was added for the indicated times, and cells were lysed, resolved via SDS-PAGE, and proteins visualized using anti-FLAG, anti- 
KPCI or anti-actin. Processing and degradation were assessed as described under (C). Chx, cyclohexamide. Actin was used to ascertain equal protein loading. 

(E) Deletion of KPC1 or KPC2 genes inhibits basal and TNFa-induced processing of endogenous pi 05. Lysates were prepared from HAP1 control or HAP1 cells 
knocked out for the genes coding for KPC1 or KPC2. The lysates were resolved via SDS-PAGE, and proteins were visualized using anti-NF-icBI , anti-KPCI , anti- 
KPC2, anti-p65, or anti-actin. The amount of pi 05 processed was calculated as the ratio between the generated p50 and the sum of p50 + pi 05, multiplied by 
100%. 

(F) The interaction between pi 05 and KPC1 increases following signaling. HEK293 cells were transfected with cDNAs coding for FLAG-p105 (lanes 2 and 3) or 
FLAG-p105S927A (lanes 4 and 5) along with Myc-KPCI (lanes 1-5) and FLAG-IKK|3 (lanes 2 and 4) or FLAG-1 KKpSI 76,1 80E (lanes 3 and 5). FLAG-p105 and 
FLAG-p105S927A were immunoprecipitated from the cell lysate using immobilized anti-FLAG (lanes 1-5), and the bound KPC1 was visualized with anti-KPCI 
(Fi). Immunoprecipitated pi 05s were visualized using anti-FLAG (Fii). 

(G) A phosphorylated peptide corresponding to the signaled sequence in pi 05 inhibits its ubiquitination. In vitro translated and ^®S-labeled pi 05 was ubiq- 
uitinated by purified KPC1 -FLAG-TEV-6xHIS (lanes 2-9) in a reconstituted cell-free system in the presence of a phosphorylated peptide derived from the signaled 
sequence of pi 05 (lanes 6-8), or in the presence of its non-phosphorylated counterpart (lanes 3-5). Presented is the change (in %) of unconjugated pi 05 re- 
mained following addition of increasing concentrations of the peptides (compared to a system to which a peptide was not added; lane 2). 

(H) KPC2 attenuates ubiquitination of pi 05 by KPC1 . Ubiquitination of in vitro translated and ^^S-labeled pi 05 by purified KPC1 -FLAG-TEV-6xHIS in the presence 
or absence of HIS-KPC2 was carried out in a cell-free reconstituted system. 

(I) KPC2 attenuates processing of pi 05 in cells. HEK293 cells were transfected with control siRNA (lanes 1-3) or siRNA to silence KPC2 (lanes 4-6). After 24 hr, 
cells were transfected with cDNAs coding for FLAG-p105 and generation of p50 was monitored 24 hr later. Processing of pi 05 was calculated as described 
under (C). 

(J) KPC1 modifies lysine residues in the C-terminal segment of pi 05. In vitro-translated and ^^S-labeled WT and the indicated pi 05 mutants were subjected to 
ubiquitination by purified KPC1-FLAG-TEV-6xHIS in a reconstituted cell-free system. 

See also Figures SI and S2. 
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Figure 3. KPC1 -Dependent Ubiquitination and Processing of p105 Require the ARs of p105 

(A) Schematic representation of p105 domains. Numbers denote the respective residue along the protein sequence. RHD, Rel homology domain; NLS, nuclear 
localization signal; GRR, glycine rich repeat; AR, ankyrin repeats (all six of them are marked). 

(B) The ARs-containing C-terminal half of p1 05 is ubiquitinated by KPC1 . In vitro-translated and ^^S-labeled p1 05, p1 05A501 -969 or p1 05A1 -434 were subjected 
to ubiquitination by purified KPC1-FLAG-TEV-6xHIS in a reconstituted cell-free system. 

(C) The ARs of p105 are essential for binding of KPC1 and for its ubiquitination by the ligase in cells. HEK293 cells that were transfected with siRNA to silence 
KPC1 (lanes 1 and 2) or with control siRNA (lanes 3-5), were also transfected with cDNAs coding for FLAG-p105 (lanes 1 and 3), p105A544-803 (lanes 2 and 4), 
HA-Ub (lanes 1-5), and Myc-KPC1 (lanes 3-5). The different FLAG-p105 species and their conjugates were immunoprecipitated from the cell lysates by 
immobilized anti-FLAG (IP; lanes 1-5). 

(D) KPC1 interacts with a single AR in p105. HEK293 cells were transfected with cDNAs coding for FLAG-p105 (lane 2), FLAG-p105A544-803 (lane 3), or FLAG- 
p105A574-803 (lane 4), along with Myc-KPC1 (lanes 1-4). The different FLAG-p105 species were immunoprecipitated from the cell lysates using immobilized 
anti-FLAG (IP; lanes 1-4). 

(legend continued on next page) 
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increased the formation of p50 (Figure 21). That, despite the fact 
that the short-term silencing reduced partially the level of KPC1 
via its effect (or absence thereof) on the stabilization of the ligase 
(Figure 2I; note the change in the level of KPC1 following KPC2 
silencing). 

Identification of the Ub Anchoring Sites on pi 05 
Modified by KPC1 

We have already shown that multiple lysines in the C-terminal 
segment of p105 are required for its ubiquitination and process- 
ing (Cohen et al., 2004; Kravtsova- 1 vantsiv et al., 2009) in 
crude extracts. It was therefore important to show that this is 
true also for KPC1 . Progressive removal of all lysine residues 
from the C-terminal segment (Figure S2C) resulted in corollary 
decrease in conjugation of p105 by KPC1 in a cell-free assay 
(Figure 2J) and in processing of the precursor in cells 
(Figure S2D). 

The C-Terminal ARs of pi 05 Are Necessary for Its 
Interaction with KPC1 and for Its Subsequent 
Ubiquitination and Processing 

p105 harbors several domains: PEL homology domain (RHD), 
nuclear localization signal (NLS), and a glycine rich repeat 
(GRR) in its N-terminal segment, and ARs, death domain (DD), 
and a PEST (proline, glutamate, serine, and threonine) sequence 
in the C-terminal segment (Figure 3A). We examined which of 
these domains is necessary for ubiquitination by KPC1 . As can 
be seen in Figure 3B, removal of the C-terminal segment 
abolished altogether conjugation in a cell-free system, whereas 
removal of the N-terminal segment had no effect. Subsequently 
we found that removal of all six ARs (p105A544-803) affected 
significantly the ubiquitination of p105 by KPC1 (Figure S3A, 
compare lanes 2 and 12). Partial deletion of the repeats affected 
conjugation only slightly (compare lane 2 to lanes 4, 6, 8, and 1 0). 

Similar results were obtained in experiments carried out in 
cells. Overexpression of KPC1 increased the ubiquitination 
of WT pi 05, but much less so of pi 05 that lacks all its ARs 
(Figure 3Ci; IP, compare lane 4 to lane 3). Importantly, in parallel, 
we observed also a decrease in the interaction between the 
ARs’-truncated pi 05 and its ligase compared to WT pi 05 
(Figure 3Ciii; IP, compare lanes 4 and 3). 

To rule out the possibility that the decrease in ubiquitination of 
pi 05 that lacks all its ARs is due to removal of the eight lysine 
residues in the repeats, we generated a mutant pi 05 in which 
all these lysines were substituted by arginines. The ubiquitination 
of the K to R mutant as well as its interaction with KPC1 , were 
similar to that of WT pi 05 (Figures S3Bi and ii, respectively). 
An interesting question relates to the number of ARs necessary 
for ubiquitination and processing of pi 05. We constructed a 



pi 05 mutant where all ARs except one have been deleted 
(p105A574-803). The single remaining AR was sufficient to 
bind KPC1 and to promote processing similar to that observed 
for WT pi 05 (Figure 3D). Thus, it appears that the ARs are redun- 
dant with relation to binding of KPC1 . 

Last, it was important to demonstrate whether the ARs-depen- 
dent ubiquitination increases the processing of pi 05. As can be 
seen in Figure 3E, mutant pi 05 that lacks all ARs, is processed 
much less efficiently compared to the WT species and to one 
lacking only some of the repeats (compare lane 12 to lanes 2, 
4, 6, 8, and 10). A similar result was obtained also in cells (Fig- 
ure 3F, lanes 1 and 2). Mutant pi 05 in which all lysine residues 
in the ARs were substituted with arginines (FLAG-p105K8R), is 
processed similarly to WT pi 05 (Figure S3C, lane 3), strongly 
suggesting that the ARs are required for the binding, ubiquitina- 
tion, and processing of pi 05, but do not serve as ubiquitination 
sites essential for processing. 

It appears that the ARs are also involved in signal-induced 
processing of pi 05, as their removal significantly decreased 
IKKp-mediated generation of p50 (Figure 3F, compare lane 4 
to lane 3). As expected, FLAG-p105S927A and FLAG- 
p105S927AA544-803 did not respond to IKKp-mediated phos- 
phorylation (Figure 3F, lanes 7 and 8). 

Overexpression of KPC1 or p50 Suppresses 
Tumor Growth 

Since NF-kB dimers are known to affect cell survival, prolife- 
ration, and tumor progression, it was interesting to study the 
outcome of KPC1 on cell growth. Initially, we monitored the 
influence of overexpressed KPC1 on anchorage-independent 
growth in MB-MDA 231, U20S, and U87-MG cells, and found 
that it inhibits colony formation by 36%, 32%, and 52%, respec- 
tively, compared to controls (Figures 4A-4C). Importantly, this 
effect was abrogated in cells overexpressing the inactive ligase 
species KPC1I1256A, suggesting that the inhibitory effect is 
due to the ligase activity (Figure 4C). Cells expressing p50 
showed an even stronger inhibition of colony formation (73% 
for both MB-MDA 231 and U87-MG cells; Figures 4A and 4C), 
strongly suggesting that the effect of the ligase is mediated 
through its activity on pi 05, resulting in excessive generation 
of p50. Supporting the linkage is the finding that silencing of 
pi 05 abrogated the strong suppressive effect of KPC1: the 
number of colonies formed using cells that overexpress KPC1 
in the absence of pi 05 was 7.5-fold larger than that formed in 
its presence (Figure 4D). It was important to study whether the 
growth suppressive effect of KPC1 and p50 is not due to induc- 
tion of apoptosis. Thus, we stained U87-MG cells that overex- 
press these proteins for cleaved caspase 3. As can be seen in 
Figure S4, we could not detect the apoptotic marker. For that 



(E) p105 that lacks its ARs is processed less efficiently in a cell-free system. The different ^^S-labeled p105 species were processed in a cell-free reconstituted 
system in the presence or absence of Fraction II as indicated. Fr II, Fraction II. 

(F) Deletion of the ARs of p105 affects both its basal and signal-induced processing. HEK293 cells were transfected with cDNAs coding for FI_AG-p105, 
FLAG-p105A544-803, FLAG-p105S927A, or FLAG-p105S927AA544-803 along with either GFP or IKK|3 as indicated. 

In (C), (D), and (F), proteins were resolved via SDS-PAGE, blotted onto nitrocellulose membrane, and p105 and p50 were detected using anti-FLAG, KPC1 was 
detected using anti-KPC1, and Ub conjugates were detected using anti-HA. Ten percent of the total cell lysates (TCL) were analyzed for the expression of 
proteins. The SDS-PAGE-resolved labeled proteins in the experiments shown in (B) and (E) were visualized using Phosphorimaging. Processing was assessed 
as described under Figure 2E. 

See also Figure S3. 
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Figure 4. KPC1 and p50 Suppress Anchorage-Independent Growth of Cells 

Suppression of colony formation by overexpressed KPC1 or p50 in MDA-MB 231 (Ai), U20S (Bi), and U87-MG (Ci and Di) cells. Cells were stably transfected with 
VO, or with cDNAs that code for Myc-KPCI , Myc-KPCI I1256A or FI_AG-p50, or with cDNA coding for Myc-KPCI along with control shRNA or shRNA to silence 
pi 05, as indicated, and were seeded on soft agar plates. After 3 weeks, the colonies were stained with 0.05% crystal violet. Data derived from five experiments 
(±SEM) are presented graphically. Expression of KPC1, KPC1I1256A, p50, and p105 is shown in (Aii), (Bii), (Cii), and (Dii). 

See also Figure S4. 
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experiment, it was also important to demonstrate that the sup- 
pressive effect of KPC1 and p50 is not due to some non-specific 
effect of the overexpression of the proteins. The unleashing of 
growth in the presence of overexpressed KPC1 but in the 
absence of p105 (Figure 4D), strongly suggests that the effect 
of KPC1 is indeed specific. 

These observations prompted us to study the effect of KPC1 in 
a tumor model in mice. We generated xenografts stably overex- 
pressing VO, KPC1, KPC1I1256A, or p50. Both the growth rate 
and weights of tumors expressing KPC1 and p50 were signifi- 
cantly smaller compared to those that harbor VO or KPC1I1 256A 
(Figures 5Ai and 5Aiii for xenografts derived from U87-MG cells, 
and Figure 5AN for xenografts derived from MDA-MB 231 cells). 
Importantly, in tumors that overexpress KPC1 , the level of p50 is 
significantly higher compared with tumors that express VO (Fig- 
ure 5Aiv), suggesting again a direct linkage between the KPC1 
ligase activity and increased generation of p50. Interestingly, 
in tumors that overexpress KPC1 or p50, we also observed a 
significant decrease in the level of p65 (Figure 5Aiv). This finding 
raises the possibility that a different NF-kB transcription factor 
is generated under the influence of KPC1 , possibly a p50 homo- 
dimer (see Discussion). To demonstrate that there are indeed 
changes in NF-kB species in human tumor xenografts overex- 
pressing KPC1 and p50, we used electrophoretic mobility shift 
assay (EMSA) to monitor the activity of the transcription factor. 
As can be seen in Figure S5A, there is a significant decrease in 
the ability of “canonical” NF-kB to bind its consensus DNA 
sequence following overexpression of KPC1 and even more so 
following overexpression of p50. 

Of note, all the effects on tumor growth (reduction in colony 
formation, tumor growth rate, and weight) were more prominent 
in p50-expressing tumors than in their KPC1 -overexpressing 
counterparts. This is not surprising, as p50 is the product of 
KPC1 activity, and its direct expression has a stronger effect. 

The functional linkage between KPC1 and p50 can also be 
observed in staining of specific proliferation and differentiation 
markers in the mice-derived tumors. The overexpression of 
KPC1 , but not of KPC1 11256A, results in increased appearance 
of nuclear NF-kB (Figure 5B), a significant decrease in the pro- 
liferation marker ki-67, and an increase in the glial fibrillary acidic 
protein (GFAP), a known glial cells differentiation marker. 
Suspecting that KPC1 stimulates apoptosis, we looked for an 
increase in cleaved caspase 3, however, there was no change 
in the levels of the active enzyme compared to control sections. 
Staining of p27Kip1 — a suggested substrate of KPC1 (Kamura 
et al., 2004)— did not show any change in the protein level (Fig- 
ure 5B). This may be due to the differences in the systems 
studied. 

KPC1 Regulates Expression of a Subset of p50 
Target Genes 

We next analyzed the profile of gene expression in the tumors us- 
ing RNA sequencing (RNA-seq) analysis of transcripts mapped 
to the human genome (http://www.ncbi.nlm.nih.gov/geo/query/ 
acc.cgi?acc=GSE60530; Table SI). The altered gene expression 
patterns revealed a strong similarity between overexpres- 
sion of p50 and KPC1 in U87-MG xenografts (correlation of 
0.51, p value < 10“^°°; Figure 6Ai), with 48 downregulated 



and 534 upregulated genes that were consistent and significant 
in all replicates (Figure 6Aii; Table S2). The relative transcript 
levels of selected genes that were shown to be significantly 
upregulated and downregulated in RNA-seq analyses, was 
corroborated by quantitative real-time PCR (qRT-PCR) (Fig- 
ure S5B). Functional analysis revealed highly significant enrich- 
ment in glycosylated and extracellular matrix proteins, and 
upregulation of genes expressing proteins involved in cell-cell 
and cell-substrate adhesion, regulation of cell migration, cell 
junctions, vasculature development, wound healing, and cell- 
cell signaling (Figure 6Aiii), suggesting a re-establishment 
of “social” micro-environmental interactions in the p50- and 
KPC1 -overexpressing glioblastoma tumors (Bonavia et al., 
2011). Downregulated processes included a reduced response 
to hypoxia required for maintaining glioblastoma stem cells 
(Heddleston et al., 2009), as well as reduced positive regulation 
of cell migration (Figure 6Aiii). Of the consistently changed 
genes, 21 are known NF-kB targets (p value < 3.4 x 10“®; 
http://www.bu.edu/nf-kb/gene-resources/target-genes/). To 
further assess if the observed reduction in tumor size was the 
consequence of a reduction in proto-oncogenes and/or of 
an increased expression of tumor suppressor genes, we 
gathered gene annotations from various sources. Enrichment 
analysis on these gene annotations revealed a significant 
(p value < 1.4 X 10“^®) increase in the expression of 40 tumor 
suppressor genes, with no significant change in other classes 
(Figure 6B). 

Finally, we integrated functional annotation enrichment and 
protein-protein interactions for the differentially regulated genes, 
which revealed a dense network of upregulated genes revolved 
around a few downregulated ones, such as interleukin-6 (IL-6), 
interleukin-6 receptor (IL-6R), and vascular endothelial growth 
factor A (VEGFA) (Figure 6C; Data SI). We included KPC1 and 
NF-kB in this analysis to retrieve possible known interactions, 
although KPC1 had no known interactions with any of the differ- 
entially regulated genes. Closer look at the core interaction 
network (Figure 6C, inset magnification) that included NF-kB is 
most prominently annotated with “regulation of cell migration” 
genes. Most other core network genes are upregulated and 
include many well-known tumor-suppressor genes. 

Taken together, our findings strongly suggest a model of 
KPC1/p50 driven glioblastoma tumor growth inhibition, that 
centers around downregulated high mobility group protein 
HMGI-C (HMGA2), lin-28 homolog A (LIN28), IL-6/IL-6R, and 
VEGFA, and upregulated tumor suppressors, which in combina- 
tion control the tumor-microenvironment as well as glioblastoma 
stem cell maintenance. 

Correlation between Expression of KPC1 and p50 in 
Human Tumoral and Normal Tissues 

Finally, we examined the relationship between KPC1 and p50 
in human tumors and normal tissues. Immunohistochemical 
staining of the two proteins (the two antibodies were shown to 
be specific; see Figure S6) revealed a high correlation between 
them in squamous cell carcinoma of head and neck (SCCHN, 
52 sections; p value < 0.005; Figure 7Aii), breast cancer (1 05 sec- 
tions; p value < 0.0001), and glioblastoma (192 sections; 
p value < 0.0017) (Figure 7Ai). It should be emphasized though 
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Figure 5. KPC1 -Mediated Excessive Gener- 
ation of p50 Inhibits Tumor Growth 

(A) Growth rates and weights of tumor xenografts 
grown in mice, and derived from U87-MG (i) and 
MDA-MB 231 (ii) ceiis expressing VO, Myc-KPCI, 
and FLAG-p50. Data represent the mean of seven 
xenografts ± SEM (iii) Tumors derived from U87- 
MG ceiis 3 weeks after inocuiation. (iv) Enhanced 
generation of p50 and disappearance of p65 in 
tumors that overexpress KPC1 and p50. Proteins 
were resoived via SDS-PAGE, biotted onto nitro- 
ceiiuiose membrane, and detected using the 
appropriate antibodies. Processing was assessed 
as described under Figure 2E. 

(B) immunohistochemicai staining of p50, 
KPC1, ki-67, cieaved caspase 3, p27Kip1, and 
GFAP in xenografts of U87-MG ceiis stabiy 
expressing VO (i), Myc-KPCI (ii), FI_AG-p50 (iii), or 
KPC1i1256A (iv). 

Aii scaie bars, 100 |am. Tumors were grown in mice 
and stained as described in the Experimental 
Procedures. 

See also Figure S5. 
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that the linkage may be tumor-specific and not common to all 
patients with the “same” tumor. 

To test the hypothesis that loss of KPC1 and nuclear p50 can 
be involved in the pathogenesis of malignant transformation, we 
analyzed the staining of the two proteins in SCCHN, breast can- 
cer, and glioblastoma, and compared it to their staining in the 
normal parallel tissue. We observed a strong decrease in tumor 
samples stained for nuclear p50 compared to the healthy tissue 
(Figure 7B). As for KPC1 , we observed a significant decrease in 
staining intensity (reflecting the amount of the protein) in 
cancerous compared to normal tissue in both SCCHN and glial 
cells, but not in breast cancer. Also, we noted a significant 
decrease in the number of tumor samples stained for KPC1 in 
SCCHN (Figure 7B). Taken together, these findings suggest 
that nuclear p50 is indeed a tumor suppressor lost in many ma- 
lignancies. At least part of this p50 loss may be due to loss of 
KPC1 which catalyzes its formation, though this may not be 
common to all tumors. 

DISCUSSION 

The vast majority of substrates of the Ub proteasome system are 
completely degraded. One intriguing and exceptional case is 
that of the p1 05 precursor of NF-kB that can be either completely 
degraded or processed in a limited manner to yield the p50 
active subunit of the transcription factor. The “decision-making” 
mechanism resulting in one of the two distinct processes has re- 
mained largely elusive. The pTrCP Ub ligase has been identified 
as the tagging enzyme involved in complete degradation of 
p105, whereas the ligase involved in processing has remained 
unknown. We have now identified the KPC complex as the puta- 
tive p1 05-processing ligase (Figures 1, 2, and 3). 

Now that the two E3s involved in degradation and processing 
of p105 have been identified, it is still not known why ubiquitina- 
tion by one enzyme results in a completely different fate of p1 05 
than ubiquitination by the other and what determines the timing 
of the two reactions. It is possible that the two ligases catalyze 
the formation of chains that differ in their anchoring sites, length, 
and/or internal linkages. These, in turn, affect the recognition 
and mechanism of action of the 26S proteasome. As for timing, 
it can be that different physiological conditions and/or the 
degree of saturation of the ARs with bound p50s are involved 
in the “decision-making” process of whether the molecule will 
be processed or destroyed completely. 

Studying the biological implications of manipulating KPC1 
revealed that it suppresses anchorage-independent growth in 
a manner that is dependent on its ligase activity and the pres- 
ence of p105. A corollary strong tumor-suppressive effect was 
demonstrated in xenografts of human tumors (Figures 4, 5, and 



6). This effect is caused probably by a significant increase in 
an entire set of tumor suppressors, some of them like the 
brain-specific protein cell adhesion molecule 3 (CADM3) (Gao 
et al., 2009), was found inactivated in glioblastoma. 

An important question relates to the transcriptional mechanism 
by which KPC1 and p50 exert their tumor-suppressive effect. An 
obvious assumption is that the stoichiometric excess of p50 
generated by KPC1 would generate mostly p50-p50 homodimers 
rather than the “canonical” tumorigenic p50-p65 heterodimers. 
In line with this finding is also the observation that p65 level is 
decreased in KPC1 -overexpressing as well as in p50-overex- 
pressing xenografts (Figure 5Aiv). It appears that each dimer of 
NF-icB family has unique and even opposing biological function(s) 
and regulates a distinct subset of downstream genes (Siggers 
et al., 2012). p50 homodimer is supposed to act as a transcrip- 
tional repressor because it does not contain a transactivation 
domain (May and Ghosh, 1997). However, studies in vitro have 
shown that p50 homodimer can interact with different transcrip- 
tional modulators, such as Bcl-3 (Fujita et al., 1993), p300 (Deng 
and Wu, 2003), or HMGA1/2 (Perrella et al., 1999) that are 
involved in chromatin remodeling. Disproportionate p50 may shift 
the composition of NF-icB dimers, resulting in overall tumor-sup- 
pressive effect. Indeed, following overexpression of KPC1 or 
p50, there is a decrease in the level of what is probably the “ca- 
nonical” tumorigenic NF-kB (p50-p65; Figure S5A). 

Importantly, we found a strong correlation between the 
expression of KPC1 and that of p50 in human tumors (Figure 7A). 
Moreover, we found a significant decrease in nuclear p50 and 
KPC1 staining intensity in tumors compared to non-malignant 
tissue (Figure 7B). This observation suggests that loss of 
nuclear p50 may trigger malignant transformation. In line with 
these findings are data collected in the Catalog Of Somatic 
Mutations in Cancer (COSMIC) that show a significantly 
greater number of common tumors (e.g., breast, lung, CNS, 
and uterine cervix) with decreased expression of KPC1 tran- 
scripts compared to those with high expression (http://cancer. 
sanger.ac.uk/cosmic/gene/analysis?ln=RNF1 23&ln1 =RNF1 23 
&start=1 &end=1 31 5&coords=AA%3AAA&sn=&ss=&hn=&sh= 
&samps=1 001 &expn=over&expn=under&id=41 85). 

EXPERIMENTAL PROCEDURES 

Materials, Plasmids, Expressed Proteins, and Cells 

All materials (including plasmids and their construction, expression of proteins 
and their purification, and cultured cells and their manipulation), are described 
in the Extended Experimental Procedures. 

Preparation and Fractionation of Crude Reticulocyte Lysate 

Reticulocytes were induced in rabbits and lysates were prepared and fraction- 
ated over DEAE cellulose to Fraction I (unabsorbed material) and Fraction II 



genes in the xenografts. Dot sizes are as in (i). (iii) Selected annotation clusters most enriched for either up- or downregulated genes (above or below dashed line, 
respectively). 

(B) (i) Enrichment analysis of consistently upregulated and downregulated transcripts for tumor suppressors and proto oncogenes annotations, (ii) Expression 
differences for all tumor suppressors (blue) and proto oncogenes (red) from (i). Gene names of the strongest differentially regulated cancer-related genes are 
shown. 

(C) Integrated analysis of functional annotation clusters and known functional and physical protein-protein interactions among all consistently upregulated and 
downregulated genes (green and red, respectively). NF-kB is shown in blue, and a close-up of the core interaction network surrounding NF-kB (inset) is displayed. 
See also Figure S5, Tables S1 and S2, and Data S1 . 
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Figure 7. Correlation between the Expression of KPC1 and p50 in Tumoral and Normal Human Tissues 

(A) (i) Correlation between expression of KPC1 and p50 in tumors. Immunohistochemistry of KPC1 and p50 in serial sections from SCCHN, and glioblastoma 
and breast cancer tissue arrays. P, p value. Analyses were carried out as described in the Experimental Procedures, (ii) Representative immunostaining of 
SCCHN sections with anti-KPC1 or anti-p50. SI, staining intensity from 3 (strong positive) to 0 (negative). Arrowheads point to nuclear staining. Scale bars, 
100 i^m. 

(B) Statistical analysis of p50 and KPC1 staining in normal and cancerous head and neck, glial and breast tissues. “Average of KPC1 SI” represents mean 
of sample staining (number of samples is indicated under “Sample size”). “KPC1 stained, %” and “nuclear p50, %” represent percent of samples stained 
for KPC1 or nuclear p50. P, p value. The p value reflects the significance of difference between staining of normal and cancer tissue. SI, staining intensity; 
N.S., non-significant. 

See also Figure S6. 



(high salt eluate) as described (Hershko et al., 1983). Fraction II (-^200 mg) was 
further resolved using different successive chromatographic methods as 
described in the Extended Experimental Procedures. 

In Vitro Translation 

p105 or p100 were translated in vitro in the presence of L-p^S]methionine us- 
ing the TNT T7 Quick reticulocyte lysate-based coupled transcription-transla- 
tion kit according to the manufacturer’s instructions. 

In Vitro Conjugation and Processing of p105 

Ub conjugation and processing of ^^S-labeled p105 were carried out in a 
reconstituted cell-free system containing crude Fraction II as described 



(Kravtsova-lvantsiv et al., 2009). For conjugation, 1 ^ig of purified Kpc1- 
FLAG-TEV-6xHIS, Kpc1 l1256A-FLAG-TEV-6xHIS, or 6xHis-KPC2 were 
added as indicated instead of Fraction II. 

Ub Conjugates in Cells 

HEK293 cells were transfected with control siRNA or siRNA against KPC1 as 
described above. After 24 hr, the cells were transfected with cDNAs coding 
for FLAG-p105 proteins along with cDNAs coding for HA-Ub and Myc- 
KPC1 , or with an empty vector. After additional 24 hr, the proteasome inhibitor 
MG132 (20 laM) was added for 3 hr, and the cells were lysed with RIPA buffer 
supplemented with freshly dissolved iodoacetamide and A/-ethylmaleimide 
(5 mM each) to inhibit deubiquitinating enzymes. p105 (both free and 
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ubiquitinated) and free p50 were immunoprecipitated with immobilized anti- 
FLAG. The beads were washed five times with RIPA buffer and proteins 
were resolved by SDS-PAGE. Free and conjugated p105 (and free p50) were 
visualized using anti-FLAG. 

Tumorigenicity 

Cell-based (soft agar) and animal (mice xenografts) tumorigenicity assays are 
described in the Extended Experimental Procedures. 

RNA-Seq Analysis 

RNA from U87-MG xenografts was isolated using RNA purification kit and 
analyzed using the lllumina HiSeq 2500 analyzer. Identification and clustering 
of the human genes are described in the Extended Experimental Procedures. 

Immunohistochemistry and Statistical Analysis 

The staining technique and statistical analysis of the staining data of SCCHN, 
breast cancer, and glioblastoma were performed as described in the Extended 
Experimental Procedures. 
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transcripts reported in this paper is deposited in NCBI GEO under accession 
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SUMMARY 

Burkholderia pseudomallei and B. mallei are bacte- 
rial pathogens that cause melioidosis and glanders, 
whereas their close relative B. thailandensis is non- 
pathogenic. All use the trimeric autotransporter 
BimA to facilitate actin-based motility, host cell 
fusion, and dissemination. Here, we show that BimA 
orthologs mimic different host actin-polymerizing 
proteins. B. thailandensis BimA activates the host 
Arp2/3 complex. In contrast, B. pseudomallei and 
B. mallei BimA mimic host EnaA/ASP actin polymer- 
ases in their ability to nucleate, elongate, and bundle 
filaments by associating with barbed ends, as well as 
in their use of WH2 motifs and oligomerization for 
activity. Mechanistic differences among BimA ortho- 
logs resulted in distinct actin filament organization 
and motility parameters, which affected the efficiency 
of cell fusion during infection. Our results identify 
bacterial Ena/VASP mimics and reveal that patho- 
gens imitate the full spectrum of host actin-polymer- 
izing pathways, suggesting that mimicry of different 
polymerization mechanisms influences key parame- 
ters of infection. 



INTRODUCTION 

The pseudomallei group of Burkholderia species are Gram- 
negative bacteria that include the opportunistic human patho- 
gens B. pseudomallei (Bp), which causes the severe disease 
melioidosis, and B. mallei (Bm), a clonal descendant of Bp that 
primarily infects equine but can cause acute human disease 
(Cheng and Currie, 2005; Wilkinson, 1981). Bp and Bm are of 
heightened concern because they are resistant to numerous an- 
tibiotics, spread via an aerosol route, and exhibit low infectious 
doses. A third species, B. thailandensis (Bf), is closely related 
yet is not pathogenic to humans. All three share key virulence 
factors despite their differences in infectivity. Because Bt infec- 
tion in animals and cells recapitulates many features of Bp and 
Bm virulence, it has been used as a model system to study 
Burkholderia pathogenesis (Galyov et al., 2010; Haraga et al., 
2008; West et al., 2008). However, the basis for the dramatic 
differences in Burkholderia virulence remains a mystery. Under- 



standing these differences will provide insight into the evolution 
of virulence among closely related bacteria. 

Key features of Burkholderia virulence include their ability to 
invade host cells and escape from phagosomes into the cytosol, 
where they replicate and undergo actin-based motility (Kespi- 
chayawattana et al., 2000; Stevens et al., 2005a). As with other 
pathogens, Burkholderia use actin-based motility to facilitate 
cell-to-cell spread, which enables dissemination within hosts 
while evading the immune system (Goldberg, 2001). However, 
Burkholderia spread is distinct in that it can occur through 
bacterial-mediated fusion of host cells (French et al., 201 1 ; Kes- 
pichayawattana et al., 2000) induced by type VI secretion 
(Schwarz etal., 2014;Toesca et al., 2014), rather than theengulf- 
ment of membrane protrusions. Regardless of the spread mech- 
anism, motility drives bacteria to the host plasma membrane to 
facilitate spread. 

To enable actin-based motility, pathogens express proteins 
that mimic or activate host actin-nucleating factors to promote 
the polymerization of actin monomers (G-actin) into filaments 
(F-actin) that elongate at fast-growing barbed ends. Listeria 
monocytogenes and Shigella flexneri produce mimics or activa- 
tors of host nucleation-promoting factors (NPFs) to stimulate the 
host Arp2/3 complex (Welch and Way, 2013), which constructs 
branched actin filament networks (Campellone and Welch, 
2010). Rickettsia species use an NPF that activates the Arp2/3 
complex early in infection (Reed et al., 2014) and later use a 
mimic of host formins (Haglund et al., 2010; Kleba et al., 2010; 
Madasu et al., 2013), which nucleate filaments and processively 
bind to barbed ends to enhance elongation rates (Paul and 
Pollard, 2009). Vibrio and Chlamydia species, which do not 
undergo motility, express proteins that use tandem actin mono- 
mer-binding sequences to nucleate F-actin (Jewett et al., 2006; 
Liverman et al., 2007; Namgoong et al., 201 1 ; Pernier et al., 201 3; 
Tam et al., 2007; Yu et al., 2011). Despite this diversity, no path- 
ogens have been shown to mimic host Ena/VASP proteins, 
which are weak nucleators, bind barbed ends, enhance filament 
elongation, and bundle filaments (Barzik et al., 2005; Breit- 
sprecher et al., 2011; Hansen and Mullins, 2010; Krause et al., 
2003; Samarin et al., 2003; Winkelman et al., 2014). Within 
this context, it is unclear whether Burkholderia mimics or acti- 
vates host nucleators or nucleates actin by an unknown mecha- 
nism. Additionally, the consequences of different nucleation 
mechanisms on actin-based motility and infection have not 
been examined. 

Burkholderia actin-based motility requires the Burkholderia 
intracellular motility A (BimA) protein (Schell et al., 2007; Sitthidet 
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et al., 201 0; Stevens et al., 2005b), a member of the trimeric au- 
totransporter (AT) family. Trimeric ATs contain highly conserved 
C-terminal sequences that mediate secretion, localization to the 
outer membrane, and trimerization (Cotter et al., 2005; Dautin 
and Bernstein, 2007). These include a p barrel, as well as an adja- 
cent ~35 residue a helix (Stevens et al., 2005b), which in other 
trimeric ATs forms a trimeric coiled coil that is positioned 
inside the barrel (Meng et al., 2006). Trimeric ATs also contain 
N-terminal passenger domains that are exposed on the bacterial 
surface. Sequence comparisons suggest that the passenger do- 
mains of BimA from Bt, Bp, and Bm polymerize host actin by 
different mechanisms (Stevens et al., 2005b). BtBimA contains 
putative actin-binding WASP homology 2 (WH2) and Arp2/3- 
binding central and acidic (CA) motifs that are conserved among 
NPFs that activate the Arp2/3 complex (Campellone and Welch, 
2010). Previous work showed that the BtBimA CA motifs are 
required to activate Arp2/3 nucleation in vitro and promote actin 
association in host cells (Sitthidet et al., 2010). In contrast, 
BpBimA and BmBimA contain three (Bp) or one (Bm) putative 
WH2 motifs and lack CA sequences. Although two of the three 
putative BpBimA WH2 sequences were shown to be required 
for actin binding, nucleation, and plaque formation (Sitthidet 
et al., 2011), the activity of BmBimA has not been examined. It 
also remains unclear what molecular mechanisms BpBimA and 
BmBimA use to nucleate actin and the role trimerization plays 
in this process. 

Here, we investigated the mechanisms of actin assembly and 
motility driven by BimA orthologs from different Burkholderia 
species. Despite the conservation of key virulence factors 
(French et al., 2011; Galyov et al., 2010; Haraga et al., 2008), 
we found that Bt, Bp, and Bm have evolved divergent mecha- 
nisms for actin polymerization. These differences result in 
Burkholderia that generate actin tails with distinct filament archi- 
tectures and exhibit different efficiencies of motility and host-cell 
fusion, providing one potential explanation for the increased viru- 
lence of Bp and Bm relative to Bt. These observations demon- 
strate that intracellular pathogens employ the full spectrum of 
actin assembly pathways and suggest that distinct mechanisms 
allow microbes to fine-tune motility to control spread during 
infection. 

RESULTS 

BtBimA Activates the Host Arp2/3 Complex to Nucleate 
Branched Filaments, whereas BpBimA and BmBimA 
Independently Nucleate Unbranched Filaments 

Previous studies indicated that BtBimA requires its CA motifs 
and the host Arp2/3 complex for actin nucleation, whereas 
BpBimA nucleates actin independent of Arp2/3 (Sitthidet et al., 
2010; Stevens et al., 2005b). The activity of BmBimA was not 
investigated. In these studies, incomplete BtBimA and BpBimA 
passenger domains that lacked trimeric coiled coils were fused 
to GST, which would induce non-native dimerization (Sitthidet 
et al., 2010, 2011; Stevens et al., 2005b). The activity of these 
constructs was modest (~1 .5- to 2.5-fold increased polymeriza- 
tion rates). We hypothesized that the full passenger domains 
in their native oligomeric states would exhibit higher nucleation 
activities that would be dependent (BtBimA) or independent 



(BpBimA, BmBimA) of Arp2/3 complex. To test this, we purified 
versions of each ortholog that spanned the passenger domain 
and extended through the trimeric coiled coil (Figure 1 A) (Meng 
et al., 2006; Szczesny and Lupas, 2008). We used pyrene 
actin-polymerization assays to test each purified BimA for activ- 
ity. BtBimA alone had little nucleation activity (Figures SI A and 
SI B), yet it activated the Arp2/3 complex in a concentration- 
dependent manner (>5-fold increase relative to actin alone), 
indicating that it functions as an NPF (Figures IB and 1C). In 
contrast, BpBimA and BmBimA displayed robust nucleation ac- 
tivity (>7- or 8-fold increase relative to actin alone), and Arp2/3 
addition had no effect (Figure SI B). To compare their potencies, 
we determined the time it took to reach half-maximum fluores- 
cence over a range of BimA concentrations and fit the resulting 
curves (Figure 1C). All three orthologs are potent nucleators 
with 230 nM BtBimA (with 50 nM Arp2/3), 60 nM BpBimA, and 
5 nM BmBimA required for half-maximum activity. Consistent 
with its ability to activate the Arp2/3 complex, BtBimA and 
Arp2/3 generated branched actin filaments (Figure ID). In con- 
trast, BpBimA and BmBimA generated unbranched filaments. 
Thus, BtBimA is an NPF that activates Arp2/3, whereas BpBimA 
and BmBimA independently nucleate actin, similar to host pro- 
teins in the formin, tandem monomer nucleator, and EnaA/ASP 
families. 

BpBimA and BmBimA Bind Filament Barbed Ends, 
Processively Elongate Filaments, and Remove Capping 
Protein from Filaments 

Formins and EnaA/ASP proteins processively track barbed ends 
and alter their elongation rates (Paul and Pollard, 2009), whereas 
tandem monomer nucleators have not been shown to track 
growing filament ends (Carlier et al., 2011; Namgoong et al., 
2011; Pernier et al., 2013). To compare BimA behavior to that 
of members of these families, we investigated how BimA ortho- 
logs interact with filament ends and tested whether they affect 
elongation. We labeled BpBimA and BmBimA with Alexa Fluor 
488 at Cys residues that were introduced at their C termini and 
used total internal reflection fluorescence (TIRF) microscopy to 
monitor the effect of BimA on rhodamine-labeled actin polymer- 
ization. BmBimA and BpBimA bound and remained associated 
with barbed ends as they grew (Figure 2A; Movies SI and S2). 
Growth rate measurements of barbed ends in the presence or 
absence of BimA indicated that BmBimA and BpBimA increased 
elongation rates by 3.5- to 4.5-fold relative to actin alone. To 
measure their apparent affinities for barbed ends, we monitored 
elongation of pre-polymerized actin seeds with increasing BimA 
concentrations and fit the concentration dependence of the 
decrease in time to half-maximum fluorescence (Figure 2B). 
These measurements produced apparent affinities for barbed 
ends of 350 nM for BpBimA and 3 nM for BmBimA, similar to 
the barbed-end affinities of formin and Ena/VASP proteins (Han- 
sen and Mullins, 2010; Harris et al., 2004; Otomo et al., 2005; 
Pruyne et al., 2002; Winkelman et al., 2014). 

Ena/VASP proteins are further distinguished from formins by 
the ability of at least one family member (Ena) to bundle and elon- 
gate two filaments at once (Winkelman et al., 2014). BmBimA 
and BpBimA also frequently gathered two or even three fila- 
ments and mediated their simultaneous elongation (Figures 2C 
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Figure 1. BtBimA Mimics Host NPFs, whereas BpBimA and BmBimA Independently Nucleate Actin 

(A) Domain schematics of BimA from different Burkholderia species. Actin-reiated and oiigomerization sequences are shown. Numbers refer to amino acid 
position in fuii-iength proteins. W, WASP homoiogy 2; C, centrai; A, acidic; P, proiine rich; Coii, trimeric coiied coii. 

(B) Pyrene actin poiymerization reactions with increasing BimA concentrations with Arp2/3 compiex (Bt) or in its absence (Bp, Bm). 

(C) The time to haif-maximum fluorescence of the polymerization curves in (B) normalized to actin alone. The means + SD are shown, and fits of each data set are 
reported to estimate the concentration at which half-maximum activity is observed. 

(D) Epifluorescence images of polymerization reactions containing BtBimA with Arp2/3 complex, BpBimA, or BmBimA after 15 min that were stabilized and 
stained with rhodamine phalloidin. 

Scale bars, 5 pm. See also Figure SI . 



and S2; Movies S3, S4, S5, and S6). Bundled filaments bound by 
a single BimA spot grew at similar rates relative to each other 
(Figure 2D) and to individual BimA-elongated filaments (Fig- 
ure 2E). Thus, in this regard, BpBimA and BmBimA activity 
closely resembles that of EnaA/ASP proteins. 

In cells, barbed ends are often capped by CapZ, which pre- 
vents filament elongation (Cooper and Sept, 2008). Purified 
CapZ displays a high affinity of ~2 nM for barbed ends (Caldwell 
et al., 1989) and a slow dissociation rate from barbed ends 
(~30 min half-life) (Schafer et al., 1996). Formins and Ena/ 
VASP proteins compete with CapZ to prevent capping (Barzik 
et al., 2005; Breitsprecher et al., 2008; Paul and Pollard, 2009) 
but have not been reported to remove CapZ from pre-capped 
filaments. To test whether BpBimA and BmBimA have anti- 
capping activity, we monitored their ability to promote filament 
elongation when added to pre-formed actin seeds simulta- 
neously with CapZ (data not shown) or when added 5 min after 
filaments had been pre-capped with CapZ (Figure 2F). Under 



both conditions, 1 0 nM CapZ alone completely inhibited elonga- 
tion, whereas the addition of 30 nM BpBimA or BmBimA to 
pre-capped filaments allowed filament elongation. These results 
suggest that BpBimA and BmBimA have the unusual ability to 
remove CapZ from barbed ends to enable elongation. Together, 
our results demonstrate that BpBimA and BmBimA are barbed- 
end-binding proteins that increase elongation rates, bundle fila- 
ments, and have anti-capping activity, similar to host Ena/VASP 
actin polymerases. 

Oligomerization Is Required for BpBimA and BmBimA 
Activity 

One feature of Ena/VASP proteins required for their actin poly- 
merase activity is a tetrameric coiled coil located at the C termi- 
nus (Breitsprecher et al., 2008; Hansen and Mullins, 2010; Kuh- 
nel et al., 2004). Based on the functional similarities of BpBimA 
and BmBimA with Ena/VASP proteins, we predicted that trimeri- 
zation is important for BimA barbed-end-binding and elongation 
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Figure 2. BmBimA and BpBimA Processively Bind Growing Filament Barbed Ends, Increase Elongation Rates, Bundle Filaments, and 
Outcompete CapZ for Barbed-End Binding 

(A) Left, TIRF images showing BmBimA-AF488 and BpBimA-AF488 (green) and rhodamine-labeled actin (magenta). Time (s) is indicated. Scale bars, 3 |am. Right, 
filament length (number of subunits x 1 ,000) over time for a minimum of 10 filaments with the mean elongation rate (sub/s ± SEM) listed. 

(B) Left and middle, pyrene elongation assays with a range of BimA concentrations. Right, the time to half-maximum fluorescence normalized to actin alone for 
BmBimA (green) or BpBimA (purple) with the mean apparent Kd ± SD listed. 

(legend continued on next page) 
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Figure 3. BpBimA and BmBimA Oligomeri- 
zation Is Required for Actin Nucleation and 
Barbed-End Binding 

(A) Alignment of the trimeric coiled coils from 
BpBimA and BmBimA with the Hia trimeric coiled 
coil (Meng et al., 2006). Positions replaced with 
Asp residues in the BimASD mutants are outlined 
in black. Hydrophobic residues are blue, and 
charged residues orange. 

(B) Domain schematics of wild-type and BimASD 
mutants in which eight positions were changed to 
Asp residues are shown above gel-filtration elution 
profiles of wild-type and mutant proteins. 

(C) The time to half-maximum fluorescence 
normalized to actin alone in polymerization re- 
actions with increasing BimASD proteins. The 
means ± SD are shown with the wild-type data 
from Figure 1C for reference. 

(D) Elongation reactions with 1 0 nM CapZ and with 
or without BimASD proteins. 
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activities. To estimate the oligomeric state of wild-type and 
mutant BimA proteins, we determined the behavior of wild- 
type BimA on a gel-filtration column. BpBimA and BmBimA 
eluted at volumes corresponding to globular proteins of much 
larger molecular weight (>500 kDa) than expected based on their 
sequences (1 27 kDa for BpBimA; 88 kDa for BmBimA; Figure 3B). 
However, other trimeric ATs display aberrant behavior that is 
attributed to their elongated structures (Cotter et al., 2005; Hart- 
mann et al., 2012; Mack et al., 1994). Next, we replaced eight 
residues of the predicted a helix with aspartic acid (D) to disrupt 
the formation of the trimeric coiled coil (Figure 3A). The resulting 
BpBimASD and BmBimASD mutants eluted at smaller sizes 



relative to their wild-type counterparts 
by gel filtration (Figure 3B), suggesting 
that the higher oligomeric state of the 
wild-type proteins was disrupted and 
that the trimeric coiled coil mediates 
oligomerization. 

To determine the importance of 
oligomerization in actin nucleation and 
elongation, we compared the activity of 
each BimASD mutant with wild-type 
BimA in pyrene actin assembly assays. 
Both BpBimASD and BmBimASD lacked 
detectable nucleation activity, and at 
high concentrations, both mutants in- 
hibited polymerization, suggesting that 
they sequester G-actin (Figure 3C). 
Neither mutant could relieve CapZ inhibi- 
tion of elongation at concentrations two orders of magnitude 
higher than that at which wild-type BimA relieved inhibition 
(Figure 3D). Thus, oligomerization of BpBimA and BmBimA is 
required for actin nucleation, barbed-end elongation, and anti- 
capping activity, similar to the requirement of tetramerization 
for EnaA/ASP actin polymerase activity. 

WH2 Motifs Are Required for BpBimA and BmBimA 
Activity 

Ena/VASP barbed-end binding and filament elongation require 
two WH2 sequences (also called the globular actin-binding or 
GAB and filamentous actin-binding or FAB motifs) (Bachmann 




(C) TIRF images of BmBimA-AF488 (arrowhead) elongating two (top) or three (bottom) filaments (colors as in A). 

(D) Graph of filament length (number of subs x 1 ,000) over time for pairs of co-elongating filaments (each pair is in matched colors of solid and dashed lines). 

(E) Mean elongation rates (sub/s ± SEM) for actin alone or for single filaments or two- and three-filament bundles elongated with BmBimA-AF488 (left, green) or 
BpBimA-AF488 (right, purple). Asterisks denote paired samples that are significantly different (one-way ANOVA; p < 0.0001). 

(F) Left and middle, elongation reactions first incubated with 1 0 nM CapZ, followed by BimA and G-actin addition. Right, the times to half-maximum fluorescence 
normalized to actin alone (dashed black line) for BmBimA (green) or BpBimA (purple) are shown with the means ± SD. 

See also Figure S2 and Movies SI , S2, S3, S4, S5, and S6. 
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Figure 4. WH2 Requirements for BpBimA and BmBimA G-Actin Binding and Nucleation 

(A) Alignment of predicted WH2 sequences from BimA orthologs with known WH2 (GAB and FAB) sequences from human VASP and Drosophila Ena. The 
conserved a helix and LKKT motif are indicated. Hydrophobic residues are blue, and charged residues orange. Boxed residues were mutated to AA. 

(B) Domain schematics of WH2 mutant BimA proteins. Predicted WH2 motifs that were mutated are outlined in blue and denoted by AA. 

(C) Anisotropy measurements of monomeric actin488 binding to BimA. Data are represented by circles, and Hill equation fits are shown as solid lines. The means 
Kd ± SD from at least two experiments are listed. 

(D) The time to half-maximum fluorescence normalized to actin alone in polymerization reactions with BimA. The means ± SD are shown with wild-type data from 
Figure 1C for reference. 



et al., 1999; Hansen and Mullins, 2010). BpBimA and BmBimA 
contain three (Bp) or one (Bm) putative WH2 sequences (Fig- 
ure 1 A) (Stevens et al., 2005a, 2005b) that are implicated in actin 
binding, and we hypothesized that each BimA would use these 
sequences for nucleation and elongation, similar to EnaA/ASP 
proteins. We determined whether the BimA WH2 sequences 
are competent for actin binding by replacing two positions of 
the conserved LKKT signature sequence with alanines (Figures 
4A and 4B). To measure G-actin binding, we titrated increasing 
wild-type or WH2 mutant BimA into a fixed concentration of 
Alexa Fluor 488-labeled G-actin (actin488) in low ionic strength 
buffer to prevent actin polymerization (Figure 4C). Anisotropy 
measurements with increasing BimA produced saturable bind- 
ing curves that were fit using the Hill equation. Wild-type BpBimA 
and BmBimA bound actin488 with apparent affinities of 350 nM 
and 670 nM (Figure 4C). Mutating any single predicted WH2 
sequence in BpBimA (BpWI, BpW2, and BpW3 mutants) or 



mutating the first two WH2 motifs together (BpWI W2) had little 
effect on G-actin binding (Figure 4C). However, simultaneously 
mutating all three WH2 sequences in the BpW1W2W3 mutant 
severely reduced binding (Kd ~13 |iM). The single BmW mutant 
also exhibited a significant reduction in affinity relative to wild- 
type BmBimA (Kq ~17 |iM). These data demonstrate that 
BmBimA contains one WH2 motif capable of binding G-actin 
and suggest that BpBimA contains up to three WH2 motifs that 
bind G-actin. 

To determine whether BimA WH2 motifs are required for fila- 
ment nucleation and elongation, we used the pyrene actin as- 
sembly assay to test each WH2 mutant for its ability to nucleate 
actin. Although the BpWI and BpW2 mutants bound G-actin 
with high affinity, they lacked nucleation activity, as evidenced 
by the absence of a decrease in time to half-maximum fluores- 
cence intensity (Figure 4D). Notably, these mutants inhibited 
polymerization at concentrations above ~1 |iM, which is likely 
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Figure 5. BimA from Different Burkholderia Species Mediates the Formation of Distinct Actin Tails and Parameters of Actin-Based Motility 

(A) Merged images showing Cos7 cells infected with different Bt strains that constitutively express RFP (magenta). F-actin was stained with Alexa Fluor 488 
phalloidin (green). Scale bars, 5 pm. 

(B) Mean actin-tail lengths for the indicated strains in Cos7 and A549 cells. Boxes outline the 25^*^ and 75^^ percentiles, midlines denote the medians, and whiskers 
show minimum and maximum lengths. Asterisks denote lengths significantly different from BtBimA in that cell type (one-way ANOVA; p < 0.05). 

(C) Percent bacteria with actin tails (±SD). 

(D) Percent bacteria with actin tails (±SD) without or with 1 hr treatment with DMSO, the Arp2/3 inhibitor CK-666, or the inactive compound CK-689. nd, no 
difference. 



(legend continued on next page) 
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due to remaining intact WH2 motifs sequestering G-actin. In 
contrast, the activity of BpW3 was similar to that of wild-type 
BpBimA, indicating that the third WH2 motif is not required for 
activity. BpW1W2 and BpW1W2W3 were unable to nucleate 
actin, and BpW1 W2W3 also lacked inhibitory activity, consistent 
with the low affinity of this mutant for G-actin (Figure 4C). Thus, 
although all three BpBimA WH2 motifs may bind actin mono- 
mers, only the N-terminal two WH2 motifs are required for nucle- 
ation activity. For BmBimA, the sole WH2 mutant BmW lacked 
detectable nucleation activity and did not inhibit actin polymeri- 
zation, indicating that this WH2 motif is required for nucleation 
(Figure 4D). Together with the requirement for oligomerization, 
these results suggest that BpBimA uses up to six and BmBimA 
uses up to three WH2 motifs within a BimA trimer to mediate 
actin nucleation and elongation. Thus, like EnaA/ASP proteins, 
BpBimA and BmBimA require oligomerization and WH2 motifs 
for their actin polymerase activities. 

Mechanistic Differences among BimA Orthologs 
Expressed in Bt Result in Distinct Filament Organization 
and Parameters of Actin-Based Motility 

To compare how differences in the biochemical properties of 
BimA orthologs affect actin filament organization and actin- 
based motility by Burkholderia, we replaced the endogenous 
copy of Bt biiriA with an identical copy of Bt bimA or with bimA 
from Bp or Bm. To assess BimA synthesis and localization, we 
engineered strains producing internally FI_AG-tagged versions 
of BimA. Bacteria producing FI_AG-BimA were compared with 
those expressing the corresponding untagged BimA orthologs 
in plaque assays and formed identical numbers and sizes of pla- 
ques as the untagged versions (data not shown). We found that 
FI_AG-BtBimA and FI_AG-BmBimA were produced at similar 
levels, whereas FLAG-BpBimA was produced at 2- to 2.5-fold 
higher levels than the others (Figure S3A). All displayed polar 
localization similar to endogenous BimA in Bp (Figure S3B) 
(Sitthidet et al., 2011; Stevens et al., 2005b) and also enabled 
actin-tail formation by Bf following infection of tissue culture cells 
(Figure 5A), similar to previous results for BimA orthologs in Bp 
(Stevens et al., 2005a). Unless otherwise noted, data were 
obtained using strains expressing untagged BimA. 

Confocal microscopy of actin-tail structures generated by 
each strain indicated that tails produced by BtBimA were curved 
and consisted of a dense actin network (Figure 5A; FI_AG-BimA 
images are shown). In contrast, BpBimA and BmBimA produced 
longer, straighter tails that consisted of bundled F-actin strands 
(Figure 5B). Interestingly, the proportion of BpBimA bacteria 
associated with tails was moderately reduced and that of 
BmBimA bacteria was severely reduced relative to BtBimA bac- 
teria (Figure 5C), suggesting differences in motility initiation. The 
Arp2/3 complex localized specifically to BtBimA tails, but not 
BpBimA or BmBimA tails, as monitored by immunofluorescence 



staining and by the presence of Arp3-GFP (Figures 5E and S3D- 
S3F). Furthermore, treatment of infected cells with the Arp2/3 
complex inhibitor CK-666 (but not the control compound CK- 
689) reduced the frequency of tail formation by BtBimA, whereas 
tails formed by BpBimA or BmBimA were unaffected (Figure 5D). 
Thus, BtBimA requires Arp2/3 complex activity to produce 
shorter and curved actin tails, whereas the EnaA/ASP mimics 
BpBimA and BmBimA work independent of Arp2/3 to produce 
longer tails with bundled filaments. 

We next used time-lapse imaging to compare actin-based 
motility parameters among strains producing different BimA or- 
thologs. Once movement was initiated, all three strains exhibited 
similar average velocities of ~30 iim/min, with rates varying 
widely for each strain (Figure S3C). However, the movement 
paths of the BtBimA strain were more curved, whereas those 
of BpBimA and BmBimA bacteria were straighter (Figure 5F; 
Movie S7). To quantify path straightness and motility efficiency, 
we divided the linear displacement by the total distance traveled 
over 40 s for each bacterium. By this criteria, the BpBimA and 
BmBimA strains moved more efficiently than the BtBimA strain 
(Figure 5G). We also monitored motility following treatment 
with increasing concentrations of cytochalasin D (CD), a drug 
that binds to and inhibits growth of barbed ends. Although treat- 
ment with 250 or 500 nM CD reduced the frequency of motility for 
all strains, it completely blocked movement by the BtBimA strain, 
whereas some BpBimA and BmBimA bacteria resisted treat- 
ment and moved at rates similar to those in untreated cells (Fig- 
ure 5H). This is consistent with the ability of these proteins to 
relieve CapZ inhibition. Overall, these results demonstrate that 
Arp2/3-dependent and Ena/VASP-like mechanisms of actin 
nucleation and elongation can drive similar rates of actin-based 
motility. However, distinct motility mechanisms also result in dif- 
ferences in motility initiation, paths, efficiency, and susceptibility 
to inhibition by actin-disrupting drugs. 

The Mechanism of BimA Actin Assembly Impacts the 
Efficiency of Host Cell Fusion 

Bacterial-mediated host cell fusion is essential for Burkholderia 
virulence, and fusion depends on actin-based motility (French 
et al., 2011; Schwarz et al., 2014). We therefore hypothesized 
that differences in actin polymerization mechanisms and motility 
impact Bur/c/io/c/er/a- mediated fusion. To compare fusion effi- 
ciencies, strains producing different BimA orthologs were as- 
sessed for their ability to form plaques on host cell monolayers. 
BtBimA and BpBimA strains formed similarly sized plaques, but 
the BmBimA strain produced smaller plaques in both Cos7 and 
A549 cells (Figure 6A). This plaque defect did not correlate with a 
reduction in BmBimA expression, a defect in localization (Figures 
S3A and S3B), intracellular replication (data not shown), slower 
rate of motility (Figure 5D), or differences in movement paths 
(Figure 5F). Instead, the reduced plaque size strongly correlated 



(E) Mean actin, Arp2/3, or non-specific 488 nm fiuorescence intensities (x 1 ,000) from at ieast 10 actin taiis (±SEM) are piotted aiong the first 12.5 |am of taii. 

(F) Tracks (ten per strain) depicting motiiity over 100 s for bacteria in Cos7 ceiis. 

(G) Motiiity efficiency from at ieast 80 tracks per strain, caicuiated as described in the text. 

(H) Motiiity veiocity over 40 s for each strain in Cos7-Lifeact-EGFP ceiis untreated or treated with DMSO or CD. Mean frequencies of motiie bacteria are iisted. 
For aii paneis except (B), asterisks denote paired sampies that are statisticaiiy different from one another (one-way ANOVA; p < 0.0001). See aiso Figure S3 and 
Movie S7. 
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Figure 6. BimA Mechanisms of Actin Nucleation Impact Actin Tail Formation and Host Cell Fusion 

(A) Plaque diameters in Cos7 or A549 cell monolayers infected with Bt expressing wild-type BtBimA, BpBimA, or BmBimA. 

(B) Merged images of Cos7 cells infected with wild-type or mutant BimA-expressing Bt strains that constitutively express RFP (magenta). F-actin was stained with 
Alexa Fluor 488 phalloidin (green). Scale bars, 5 i^m. 

(C) Plaque diameters in Cos7 or A549 cell monolayers infected with Bt expressing wild-type or mutant BimA. 

(A and C) Boxes outline the 25^^ and 75^^ percentiles, midlines denote the medians, and whiskers show minimum and maximum tail lengths. Asterisks denote 
significantly different sample pairs (one-way ANOVA; p < 0.05). See also Figure S4. 



with a low frequency of actin-tail association (<5%; Figure 5C), 
suggesting that the initiation of actin-based motility is a crucial 
parameter for Burkholderia host cell fusion. 

We investigated the mechanisms used by BpBimA and 
BmBimA to enable intracellular motility and fusion by testing 
the prediction that each BimA would require one or more WH2 
motifs to drive movement during infection. We generated Bt 
strains in which we replaced endogenous bimA with genes en- 
coding WH2 mutants (including BpW1, BpW2, BpW3, and 
BmW). The synthesis and localization of BimA in each strain 
were similar to those of wild-type BimA as determined by immu- 
nofluorescence staining against internal FLAG tags (Figure S4). 
Consistent with their lack of polymerization activity, the BpW1 
and BpW2 mutants had no detectable association with F-actin, 
whereas the BpW3 mutant generated actin tails similar to wild- 
type BpBimA (Figure 6B). The BmW mutant similarly lacked 
actin association. We next examined the efficiency of fusion 
by measuring the plaque-forming ability of the mutants. The 
BpW1 , BpW2, and BmW strains were completely defective in 



plaque formation, whereas the BpW3 strain generated a similar 
number and size of plaques as the wild-type BpBimA strain (Fig- 
ure 6C). Thus, BpBimA requires WH2 motifs W1 and W2, and 
BmBimA requires its sole WH2 motif for actin-based motility 
and fusion during Bt infection. These results demonstrate that 
molecular mimicry of host EnaA/ASP proteins is crucial for 
actin-based motility driven by BpBimA and BmBimA in host 
cells. 

DISCUSSION 

Microbe-driven actin-based motility plays a crucial role in spread 
and virulence of many pathogens and is also used as a model 
system to understand actin dynamics in host cells. Almost all 
motile pathogens use an actin-polymerization mechanism that 
depends on the host Arp2/3 complex (Truong et al., 201 4; Welch 
and Way, 2013), with the exception of Rickettsia species that 
also use a formin-like mechanism late in infection (Haglund 
et al., 2010; Madasu et al., 2013; Reed et al., 2014). Here we 
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Figure 7. A Model for BtBimA, BpBimA, and BmBimA Nucleation and 
Formation of Actin Tails and Multinucleated Giant Cells 

(A) BtBimA activates host Arp2/3 complex using its WCA domain to nucleate 
branched actin networks. BpBimA nucleates and elongates filaments from 
barbed ends using its two N-terminal WH2 motifs, whereas BmBimA exhibits 
the same activities using its single WH2 motif to generate bundled filaments. 

(B) BtBimA and Arp2/3 form shorter tails with branched actin networks. 
BpBimA and BmBimA produce longer tails of bundled filaments. 

(C) Burkholderia synthesizing BtBimA or BpBimA form more actin tails and 
generate larger multinucleated giant cells compared with bacteria producing 
BmBimA. 



show that virulent Burkholderia species employ a previously un- 
described mechanism to co-opt actin for motility, mimicking 
EnaA/ASP actin polymerases to nucleate, elongate, and bundle 
filaments. 

In contrast to BtBimA activation of the Arp2/3 complex, we 
found that BpBimA and BmBimA function independently of 
Arp2/3. Both directly nucleate actin, bind processively to fila- 
ment barbed ends, and increase the elongation rate of bundled 



filaments (Figure 7). Moreover, they have the unusual ability to 
dissociate CapZ from barbed ends, a property they share with 
the Vibrio cholerae VopF protein (Pernier et al., 2013), as well 
as to resist the action of the drug CD during infection. These ac- 
tivities most closely mirror those of the EnaA/ASP family of actin 
polymerases (Krause et al., 2003), which similarly bind to fila- 
ment barbed ends, processively elongate filaments, and under 
some conditions protect growing barbed ends from CapZ inhibi- 
tion (Barzik et al., 2005; Bear et al., 2002; Breitsprecher et al., 
2008; Hansen and Mullins, 2010; Schirenbeck et al., 2006; Win- 
kelman et al., 2014). Perhaps the most striking parallel between 
the activities of these proteins is their shared ability to gather and 
elongate multiple filaments simultaneously, which is a recently 
described property of Drosophila Ena (Winkelman et al., 2014). 
This enables BpBimA and BmBimA to generate networks of 
bundled filaments similar to those formed by Ena/VASP proteins 
in mammalian cells (Krause et al., 2003). 

BpBimA and BmBimA also display sequence requirements 
similar to those of Ena/VASP for their activity, including both 
WH2 and coiled coils. BimA trimerization would result in the 
close positioning of multiple WH2 sequences (three for BmBimA 
and up to nine for BpBimA), which may promote elongation 
through the proximity of G-actin and barbed-end-binding sites 
(Figure 7). Higher-order oligomerization of BpBimA or BmBimA 
may also contribute to their ability to bundle and elongate multi- 
ple filaments, similar to the tetramerization requirement and role 
of clustering in Ena/VASP protein processivity (Bachmann et al., 
1999; Breitsprecher et al., 2008; Hansen and Mullins, 2010; Sa- 
marin et al., 2003). 

Our analysis also revealed that Bt producing different BimA or- 
thologs generate actin tails with distinct organization (Figure 7) 
and move differently within host cells. BtBimA bacteria generate 
shorter and curvier tails that are densely packed with actin. 
Although they move at velocities similar to bacteria expressing 
other orthologs, they exhibited lower motility efficiency due to 
their curvier paths. These characteristics are similar to those of 
L monocytogenes, which use the NPF ActA to activate Arp2/3 
and polymerize actin (Welch et al., 1998). In contrast, Bt synthe- 
sizing BpBimA or BmBimA produce longer, straighter tails 
consisting of bundled filaments and exhibit a higher motility effi- 
ciency due to straighter paths. The bundled filament organiza- 
tion in these tails is consistent with the Ena/V ASP-like activities 
of BpBimA and BmBimA in vitro. Thus, virulent Burkholderia spe- 
cies have evolved a distinct molecular mechanism for efficient 
actin-based motility. 

Actin-based motility driven by BimA orthologs also differs in 
promoting host cell fusion. In particular, the BmBimA strain ex- 
hibited reduced fusion efficiency relative to the BtBimA and 
BpBimA strains. Reduced fusion did not correlate with altered 
velocity, directionality of movement, or even the underlying poly- 
merization mechanism. Fusion did, however, correlate with a 
~5- to 10-fold reduction in the frequency of BmBimA-driven 
tail formation and motility initiation. This defect is surprising 
considering that BmBimA possesses more potent nucleation 
and elongation activity than BpBimA in vitro. Thus, despite its 
increased activity in isolation, BmBimA appears to be a less effi- 
cient actin nucleator in cells. Differences between the activity of 
BimA outside versus inside cells may be due to differential 



Cell 161, 348-360, April 9, 2015 ©2015 Elsevier Inc. 357 




Cell 



requirements for other bacterial or host factors that play impor- 
tant roles in motility. For example, BmBimA may require addi- 
tional host actin regulators that are present in equine cells but 
absent from human cells. Regardless, these findings suggest 
that the ability of BimA to initiate actin-based motility is a partic- 
ularly crucial parameter to enable host cell fusion. 

Why has BimA evolved distinct mechanisms of generating 
actin filaments? Bp is a soil saprophyte that infects a broad range 
of mammals (Galyov et al., 201 0), and Bm is a clonal descendant 
of Bp that has undergone reductive genome evolution and cannot 
survive outside of a host (Galyov et al., 201 0; Godoy et al., 2003; 
Nierman et al., 2004). Bt is a soil saprophyte like Bp, but its natural 
host is unknown. The sequence and mechanistic similarities 
between BpBimA and BmBimA reflect their evolutionary lineage, 
and it is interesting that the length and complexity of BmBimA 
have decreased relative to those of BpBimA, mirroring changes 
in the sizes of their respective genomes. The evolution of 
BtBimA is less clear. We speculate that distinct BimA mecha- 
nisms evolved in response to different host cell environments; 
whereas BpBimA is optimized for infection of various mammals, 
BmBimA is fine-tuned to function in equines, and BtBimA has 
adapted to unknown hosts in the soil environment. 

As a whole, our results highlight that even closely related spe- 
cies have evolved to mimic a diverse spectrum of host actin-poly- 
merizing pathways and that mimicry of different polymerization 
mechanisms may influence key parameters of infection, such 
as host cell fusion and bacterial dissemination. The acquisition 
of distinct polymerization mechanisms may be relevant for the 
evolution of virulence, as both highly virulent Burkholderia spe- 
cies evolved mimics of host EnaA/ASP proteins that nucleate, 
elongate, and bundle actin filaments, as well as remove filament 
capping proteins. However, the ramifications of these actin-reg- 
ulatory capabilities on virulence remain to be appreciated. Study- 
ing the divergent actin-based motility mechanisms of closely 
related species represents a powerful approach to unravel the 
evolution of pathogenic strategies for exploiting actin and to 
reveal new principles that govern the generation, dynamics, 
and regulation of distinct actin networks in cells. 

EXPERIMENTAL PROCEDURES 

Protein Purification and Fiuorescent Labeiing 

SUMO-6XHis-BimA orthologs were expressed in E. coli and isolated by affin- 
ity, anion-exchange, and/or gel-filtration chromatography. Alexa Fluor 488 C5 
maleimide (Life Technologies) was used to label Cys-containing BpBimA and 
BmBimA. Details of the purification and labeling methods are in the Extended 
Experimental Procedures. 

Buik Actin Assembiy Assays 

Pyrene actin polymerization reactions contained rabbit skeletal muscle G- 
actin (1 or 3 ^iM, 10% pyrene labeled) and BimA. Elongation assays were per- 
formed by mixing F-actin with BimA and initiated by adding G-actin (250 nM). 
For CP competition, 1 0 nM CP was mixed with F-actin seeds, BimA was added 
after 5 min, and reactions were initialized with G-actin (250 nM). For buffers and 
further details, see Extended Experimental Procedures. 

BimA Gel-Filtration Chromatography 

BimA proteins were run over a Superdex 200 10/300 GL gel filtration column 
(GE Healthcare) and detected by absorbance at 230 nm. See Extended Exper- 
imental Procedures for further details. 



Epifluorescence and TIRF Microscopy to Visualize Actin Filaments 

For epifluorescence microscopy, polymerization reactions were stabilized 
by adding 1 laM rhodamine-phalloidin (Life Technologies), and F-actin was 
visualized as described previously (Haglund et al., 2010). TIRF microscopy 
was performed as previously described (Kovar et al., 2006), except that reac- 
tions contained 1 i^M ATP-actin (33% rhodamine-labeled). At least 10 fila- 
ments without BimA, or 20 filaments with 1 00 pM BpBimA or 1 0 nM BmBimA, 
were measured over 240 s by manually tracing filaments. See Extended Exper- 
imental Procedures for further details. 

G-Actin Binding Anisotropy 

BimA proteins were added to 100 nM Alexa Fluor 488-labeled G-actin under 
non-polymerizing conditions. Binding curves were fit using the Hill equation, 
and the means of at least two titrations are reported. See Extended Experi- 
mental Procedures for further details. 

Bacteria i Strain Construction 

Parental Bt strain E264 was a gift from P. Cotter (University of North Carolina, 
Chapel Hill, NC, USA). BtbimA and BtmotA2 strains were gifts from C. French 
and J. F. Miller (University of California, Los Angeles, CA, USA). Details of strain 
construction are in the Extended Experimental Procedures and in Tables SI 
and S2. 

B. thailandensis Infection of Mammaiian Ceiis 

A549, COS7, HEK293T, and U20S cells were from the University of California, 
Berkeley tissue culture facility. Polyclonal cell lines stably expressing Lifeact- 
EGFP (COS7) or F-Tractin-Wasabi (A549) were generated by lentiviral trans- 
duction. For infections, bacteria grown in LB broth were added to pre-seeded 
mammalian cells for 1 hr at 37°C at high MOIs (100-200) before adding media 
with gentamicin (Gm; 0.5 mg/ml). Fixed and live-cell analyses were performed 
with samples at 8-15 hpi. For plaque assays, cells were infected (MCI of 0.1) 
for 1 hr and overlaid with a mixture of agarose in media with Gm and imaged 
by neutral red staining at 36 hpi. See Extended Experimental Procedures for 
further details. 

Image Analysis 

Image analysis details are in the Extended Experimental Procedures. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, four 
figures, two tables, and seven movies and can be found with this article online 
at http://dx.d0i.0rg/l 0. 1 01 6/j.cell.201 5.02.044. 
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SUMMARY 

Contact inhibition of locomotion (CIL) is a multifac- 
eted process that causes many cell types to repel 
each other upon collision. During development, 
this seemingly uncoordinated reaction is a critical 
driver of cellular dispersion within embryonic tissues. 
Here, we show that Drosophila hemocytes require a 
precisely orchestrated CIL response for their devel- 
opmental dispersal. Hemocyte collision and sub- 
sequent repulsion involves a stereotyped sequence 
of kinematic stages that are modulated by global 
changes in cytoskeletal dynamics. Tracking actin 
retrograde flow within hemocytes in vivo reveals syn- 
chronous reorganization of colliding actin networks 
through engagement of an inter-cellular adhesion. 
This inter-cellular actin-clutch leads to a subsequent 
build-up in lamellar tension, triggering the develop- 
ment of a transient stress fiber, which orchestrates 
cellular repulsion. Our findings reveal that the phys- 
ical coupling of the flowing actin networks during 
CIL acts as a mechanotransducer, allowing cells to 
haptically sense each other and coordinate their 
behaviors. 

INTRODUCTION 

Contact inhibition of locomotion (CIL), which is a cessation of 
forward movement upon migratory collision, is a process com- 
mon to many cell types (Abercrombie and Heaysman, 1953; 
Astin et al., 2010; Dunn and Paddock, 1982; Gloushankova 
et al., 1998) that has recently been revealed to behave as a 
migratory cue for developmentally dispersing populations of 
cells during embryogenesis (Carmona-Fontaine et al., 2008; 
Davis et al., 2012; Stramer et al., 2010; Villar-Cerviho et al., 
2013). This multifaceted phenomenon requires cells to specif- 
ically recognize each other, modulate their migratory capacity, 
and depending on the cell-type, subsequently repolarize. As a 
result of this complexity, the mechanisms behind CIL are largely 
unknown, and it is additionally unclear how these various behav- 
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iors during the process are integrated to induce a seamless 
response. 

A range of inter-cellular adhesions and intracellular signaling 
pathways are postulated to be involved in CIL (e.g., Eph-ephrin 
[Astin et al., 2010], small GTPases [Carmona-Fontaine et al., 
2008], planar cell polarity pathway [Carmona-Fontaine et al., 
2008], and cell-cell adhesion [Gloushankova et al., 1998]). How- 
ever, it is unclear exactly how these various signals feed 
into the cytoskeletal machinery to control the response. More 
crucially, nothing is known about the actin dynamics involved 
in CIL. As a central aspect of CIL is a rapid change in migration, 
it is clear that to understand the mechanisms behind this 
phenomenon it will be crucial to elucidate the dynamics of the 
actin network during the response. 

During cell migration, the actin network provides the propul- 
sion that allows a cell to generate movement. The actin cytoskel- 
eton within the lamella of a migrating cell is in a constant state of 
retrograde flow. Actin polymerizes at the leading edge, which 
pushes the cell membrane forward. Subsequently, the force of 
polymerization against the membrane along with Myosin II driven 
contraction drives retrograde movement of the actin network; it 
is this treadmill that generates the forces behind cell motility. 
When a cell moves, cell-matrix receptors, such as integrins, 
become engaged and bind to the extracellular matrix. Integrin 
activation leads to a slowing of the actin flow at this integrin- 
based point of friction, and the force of the moving actin network 
is then transformed into extracellular traction stresses (Gardel 
et al., 2008). This integrin-dependent actin-clutch, and the resul- 
tant inverse correlation between actin flow and traction force, 
is hypothesized to be involved in the movement of numerous 
cell types. 

We have been exploiting the embryonic migration of 
Drosophila macrophages (hemocytes) to understand the regula- 
tory mechanisms of CIL and the function of this process during 
embryogenesis (Davis et al., 2012; Stramer et al., 2010). These 
cells develop from the head mesoderm and disperse throughout 
the Drosophila embryo taking defined migratory routes. One of 
these routes occurs just beneath the epithelium along the ventral 
surface where their superficial location in the embryo allows 
them to be imaged live at high spatio-temporal resolution 
approaching what can be achieved from cells in culture. This 
has revealed that hemocytes spread out to form an evenly 
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distributed pattern beneath the ventral surface within a thin 
acellular cavity (the hemocoel) (Stramer et al., 2010). We previ- 
ously developed a mathematical model of hemocyte dispersal, 
and computer simulations revealed that this uniform cell spacing 
may be driven by contact inhibition (Davis et al., 2012). Indeed, a 
similar analysis of Cajal Retzius cell migration in the cerebral cor- 
tex showed an identical requirement for CIL in their dispersion 
(Villar-Cerviho et al., 2013), suggesting that CIL is a conserved 
mechanism capable of generating tiled cellular arrays. 

Here, we show that hemocyte developmental dispersal re- 
quires precise contact inhibition dynamics. Quantification of 
changes in speed and direction during cellular collisions reveals 
that their CIL response is not stochastic but involves distinct 
kinematic stages that are synchronized between colliding part- 
ners. We also show that this choreographed movement involves 
a coordinated change in actin dynamics. Tracking actin flow 
within hemocytes in vivo reveals a physical coupling of the 
colliding actin networks through engagement of a transient 
inter-cellular adhesion. It is this “inter-cellular actin-clutch” and 
the coordinated build-up and release of lamellar tension in 
colliding cells that orchestrates their behaviors, allowing CIL to 
behave as an instructive migratory cue. 

RESULTS 

The Kinematic Steps of the CIL Response Are 
Synchronized in Colliding Hemocytes 

Hemocytes disperse evenly within the ventral hemocoel during 
Drosophila embryogenesis and their CIL dynamics can be pre- 
cisely analyzed during this process (Figure 1 A; Movie SI) (Davis 
et al., 2012; Stramer et al., 2010). To elucidate the migratory 
phases of CIL, we first analyzed the changes in acceleration 
throughout the response with reference to the time of micro- 
tubule alignment between colliding hemocytes (Figure IB; 
Movie SI), which we previously revealed is a hallmark of CIL 
that is associated with a change in hemocyte motility (Stramer 
et al., 2010). Our data revealed that there is a back acceleration 
upon microtubule alignment, signifying that cells were slowing 
down and/or changing direction (Figure 1C) (Davis et al., 2012). 
This was significant when calculated at either 60- or 20-s 
intervals (Figures 1 C and SI A) highlighting that the time of micro- 
tubule alignment is correlated with a sudden change of motion 
during CIL. 

The time of microtubule alignment allowed us to temporally 
register collisions and extend the time course of the acceleration 
analysis. We observed that 120 s before microtubule alignment 
there was a sudden forward acceleration, and 180 s after, an 
additional back acceleration event (Figure 1C). To determine 
whether these accelerations were due to changes in cell speed 
and/or direction, we quantified the internuclear distance of 
colliding cells during the CIL time course. Two minutes prior to 
microtubule alignment, the graph of internuclear distance over 
time revealed a sudden increase in slope (Figure 1 D). This sug- 
gested that the cell speed increased, which was confirmed by 
analyzing the nuclear displacement rates (Figure IE). Immedi- 
ately upon microtubule alignment, the speed reduced (Figure 1 E), 
which explained the sudden back acceleration (Figure SI A), and 
~120 s later the nuclei moved apart (Figure 1 D). Analysis of the 



SD of the internuclear distance over time also highlighted these 
stages by showing an abrupt decrease in variance as cells pro- 
gressed from one phase to the next, suggesting that these 
stages were differentially regulated (Figure SIB). These distinct 
phases were also visualized by calculating the average velocity 
vector of left and right colliding cells (Movie SI), which addition- 
ally revealed the coordinated behavior of hemocytes during CIL. 

As we previously revealed that the lamellae of colliding cells 
overlap before microtubule alignment (Stramer et al., 2010), we 
hypothesized that this initial interaction was instigating the kine- 
matic changes. Indeed, lamellae of colliding cells made contact 
105 ± 22 s prior to microtubule alignment (Figures IF and 1G; 
Movie SI), coinciding with the forward acceleration phase of 
CIL. Furthermore, an actin fiber developed after lamellae contact 
that connected the colliding cells, which microtubules sub- 
sequently utilized as guides during alignment (Figure 1G). It is 
important to note that the formation of this actin fiber, and the 
subsequent repulsion, was not observed when a hemocyte con- 
tacted the rear of another cell (i.e., not the lamella), or collided 
with the lamella of a static cell (Movie SI ). This analysis highlights 
that an interaction between lamellar actin networks of migrating 
cells is initiating the CIL response. 

Analysis of the separation phase of CIL also revealed a syn- 
chronous response between colliding cells. Kymography of 
lamellar retraction revealed that colliding partners simulta- 
neously retracted their lamellae at two to three times the speed 
of retraction events of freely moving cells (Figures 1H-1J). This 
retraction event initiated 32 ± 22 s after microtubule alignment, 
which coincided with the initiation of movement away from the 
colliding partner. The retraction of lamellae occurred prior to 
the development of new protrusions away from the colliding 
partner, suggesting that rapid lamellar retraction initiates cell 
repolarization (Figures SI C and SI D). Different cell-types exhibit 
distinct CIL behaviors, which are classified as either type 1 
(involving contact-induced lamellar contraction) or type 2 (inhibi- 
tion of locomotion without contraction) responses (Stramer et al., 
201 3). Our analysis suggests that hemocytes undergo a classical 
type 1 response similar in description to chick heart fibroblasts, 
which were also observed to exhibit sudden lamellar recoil 
(Abercrombie and Heaysman, 1953). 

The Actin Cytoskeleton Rapidly Reorganizes in Colliding 
Partners during CIL 

To understand how the actin networks were mediating the 
response, we analyzed the actin retrograde flow dynamics dur- 
ing CIL. Time-lapse movies of freely moving hemocytes, labeled 
with the actin probe, LifeAct-GFP, during their developmental 
dispersal revealed a highly dynamic actin network within their 
lamellae (Movie S2). We adapted a fluorescent pseudo-speckle 
tracking technique (Betz et al., 2009) to quantify the precise 
speed and direction changes of actin flow within hemocytes 
in vivo (Figure S2A; Movie S2). This revealed that freely moving 
hemocytes in vivo have a mean actin flow rate of 3.2 ± 1 .8 |im/ 
min, which is similar to growth cones in vitro (Betz et al., 2009). 

Live imaging of collisions revealed significant reorganization of 
the actin networks during the response (Figure 2A; Movie S2). 
This reorganization coincided with the development of an actin 
fiber, which ran perpendicular to the leading edge, linking the 
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Figure 1. Hemocyte Contact Inhibition 
Involves Multiple Stages that Are Synchro- 
nous and Coordinated in Colliding Partners 

(A) Dispersal of hemocytes labeled with a nuclear 
marker (red) beneath the ventral surface of a 
Drosophila embryo (bright-field) at developmental 
stages 14 and 15. 

(B) Automatic tracking of nuclei (red) of colliding 
hemocytes while also registering collisions with 
microtubules (green). Time point of microtubule 
alignment (arrowhead) allows for temporal regis- 
tration of CIL events in subsequent kinematic 
analyses. 

(C) Time course of hemocyte accelerations 
(black arrows) surrounding a collision event with 
reference to the colliding partner (red arrow). 
All time points show random accelerations except 
at -120, 0, and 180 s where there is a bias along 
the X axis. *p < 0.05, ***p < 0.001 . 

(D) Graph showing the internuclear distance of 
colliding cells during the CIL time course. Note the 
change in slope at -120, 0, and 120 s. Error bars 
represent SD. 

(E) Graph showing nuclear speed during collisions. 
Note the increase in speed at -120 s and the 
subsequent decrease upon microtubule align- 
ment. Error bars represent SD. 

(F) Time-lapse sequence of colliding hemocytes 
labeled with an F-actin (magenta) and a micro- 
tubule (green) probe. Arrows highlight region of 
lamellae overlap. 

(G) Kymograph of the region highlighted in (F) 
showing the time course of actin fiber formation 
(arrowhead highlights the initial development of 
the actin fiber) and microtubule alignment. 

(H) Kymograph of lamellar activity (red regions 
show lamellar retraction and blue extension) in 
colliding partners along the actin fiber (red dotted 
line in schematic). Note that retraction is simulta- 
neous in colliding cells upon lamellae release. 

(I) Quantification of the rate of lamella retraction 
overtime. Error bars represent SEM. 

(J) Quantification of lamella retraction rates at 
5 and 20 s after cell separation compared with 
average retraction rates in freely moving cells. 
Error bars represent SD. *p < 0.05. 

See also Figure S1 and Movie S1 . 



lamellae of colliding cells (Figures 2A and S2B). Pseudo-speckle 
microscopy of collisions highlighted a slowing of the actin flow 
within a corridor that colocalized with the actin fiber (Figure 2A; 
Movie S2) and the aligned microtubule bundle (Figure 2B; Movie 
S2). Quantification of the actin flow rate within the region sur- 
rounding the actin fiber revealed a decrease in magnitude during 
the response, which suddenly increased upon lamellae separa- 
tion (Figures 2C, 2D, and S2C; Movie S2). It is interesting to 



note that these analyses highlight that 
the increase in actin flow speed occurred 
in two phases; there was an abrupt spike 
immediately upon lamellae separation 
lasting ~20 s (that coincided with the 
duration of lamella recoil) (Figure 1 1), fol- 
lowed by an additional increase during cellular repolarization 
(Figures 2D and S2C). Analysis of instantaneous changes in 
flow direction also revealed an increase in rotation after lamellae 
contact (Figure 2E), which rapidly returned to levels observed in 
freely moving cells after lamellae separation (Figure 2F; Movie 
S2). The change in flow direction coincided with a movement 
of actin fibers within the lamella toward the nascent actin cable, 
which contributed to its formation (Figure S2D). Upon lamellae 
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Figure 2. During Contact Inhibition, the 
Actin Network Is Rapidly Reorganized in 
Colliding Partners 

(A) Top panels are still images from a time-lapse 
movie of hemocytes containing labeled F-actin 
during a collision. While cells are in contact, an 
actin fiber develops between the cell body and the 
point of contact in colliding partners (arrowheads), 
which often deforms and breaks upon lamellar 
retraction (red arrow). Bottom panels highlight 
actin flow dynamics obtained from the pseudo- 
speckle analysis. Note that the decreased actin 
flow in the vicinity of lamellae overlap (highlighted 
by yellow arrows) is due to the inability of the 
algorithm to distinguish between the two networks. 

(B) Kymograph of the region surrounding the 
actin fiber highlighting the actin retrograde flow 
dynamics and the alignment of the microtubule 
bundles (pseudocolored white). 

(C and D) Instantaneous changes in retrograde 
flow rate quantified from lamellae contact (C) or 
lamellae separation (D). 

(E and F) Instantaneous changes in retrograde flow 
direction quantified from lamellae contact (E) or 
lamellae separation (F). 

For (C)-(F), error bars represent SD. See also Fig- 
ure S2 and Movie S2. 
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separation, the actin fiber deformed and was subsequently 
lost as the actin flow rapidly returned to its normal retrograde 
direction (Figure 2A; Movie S2). These data highlight that CIL 
involves a dramatic reorganization of the lamellar actin network. 

Development of a Transient Cell-Cell Adhesion 
during CIL Coincides with a Coordinated Reorganization 
of the Actin Network 

As a number of cell types have been reported to form transient 
inter-cellular adhesions during CIL (Gloushenkova et al., 1998; 
Theveneau et al., 2010), we wanted to determine whether a 



cell-cell adhesion was responsible for 
the rapid and seemingly coordinated 
actin network changes in colliding hemo- 
cytes. As Zyxin is known to be a marker of 
both cell-matrix and cell-cell adhesions 
(Hirata et al., 2008), we expressed 
mCherry-Zyxin in hemocytes and colo- 
calized Zyxin-labeled adhesions with 
actin during CIL. Immediately upon 
lamellae overlap, a punctum of Zyxin 
developed at the site of cell-cell contact 
and persisted for the duration of the 
response (Figures 3A and 3B; Movie S3). 
Subsequently, the actin fiber formed 
immediately behind this concentration of 
Zyxin (Figures 3A and 3B; Movie S3). We 
hypothesized that Zyxin foci represented 
transient cell-cell adhesions that modu- 
late actin retrograde flow in a process 
analogous to the integrin-based actin- 
clutch reported in migrating cells in vitro 
(Gardel et al., 2008). Indeed, visualization of Zyxin while 
analyzing actin flow revealed that after development of the Zyxin 
puncta the retrograde flow rate decreased within a corridor 
immediately behind (Figures 3C and 3D; Movie S3). Furthermore, 
microtubules polymerized toward this site of adhesion (Figure 3E; 
Movie S3). Immediately upon microtubule targeting of this adhe- 
sion, Zyxin levels decreased (Figure 3F). 

The development of an inter-cellular adhesion during CIL 
suggested that the colliding actin networks were becoming 
physically coupled. We therefore examined whether this 
coupling could lead to synchronous changes in actin retrograde 
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Figure 3. Actin Network Reorganization 
Correlates with the Formation of a Transient 
Cell-Cell Adhesion 

(A) Still image of a collision between hemocytes 
expressing mCherry-Zyxin (green) and labeled F- 
actin (magenta), which highlights the inter-cellular 
adhesion at the point of initial contact (arrowhead). 
Arrows highlight region of lamellae overlap. 

(B) Kymograph of Zyxin and actin dynamics in the 
region of the actin fiber. Note that the punctum of 
Zyxin forms in line with the actin fiber and persists 
for the duration of the time in contact (arrowhead 
highlights the initial formation of the punctum). 

(C) Quantification of the maximum intensity of 
Zyxin and average actin flow rate during the 
collision. 

(D) Analysis of actin flow dynamics in comparison 
with Zyxin localization (pseudocolored white). Note 
that the region of low retrograde flow develops in 
line with the inter-cellular adhesion (arrows). 

(E) Kymograph of Zyxin and microtubule dynamics 
in the region of the actin fiber highlighting micro- 
tubule targeting of the Zyxin puncta. 

(F) Maximum intensity of Zyxin and microtubules at 
the inter-cellular adhesion in the region highlighted 
in (E). 

(G-J) Cross correlation of the instantaneous 
changes in flow rate (G and H) and flow direction (I 
and J) in lamellae of colliding cells. Error bars 
represent SEM. Red dotted lines represent the 
mean correlation between colliding cells immedi- 
ately prior to cell-cell contact with the thickness 
representing the SEM. 

See also Movie S3. 



tact, which remained high until the time of 
separation (Figures 31 and 3J). These data 
reveal that, similar to the orchestrated 
motion of colliding cells, the actin net- 
works are behaving in a coordinated 
fashion during CIL. 
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flow in colliding cells. Investigation of the correlation between 
instantaneous changes in flow speed in colliding partners 
revealed an increase immediately upon lamellae overlap, which 
slowly diminished as cells remained in contact (Figure 3G). 
Subsequently, at the time of lamellae separation there was an 
additional abrupt increase in the correlation of instantaneous 
changes in flow speed (Figure 3H; Movie S3). Similarly, correla- 
tion between the instantaneous changes in flow direction of 
colliding partners showed an increase ~20 s after lamellae con- 



increase and Redistribution of Actin 
Network Stress during Cell 
Collision 

The sudden and synchronous retraction 
event that occurred upon cell separation 
(Figure 1H) suggested that tension is 
developing within the actin network dur- 
ing CIL. We therefore analyzed lamellar 
tension by laser abscission of the actin 
cytoskeleton as the recoil rate of the 
network is indicative of tension within the lamella. Ablating the 
leading edge or an actin fiber within the lamellae of a freely mov- 
ing cell led to an initial recoil rate of 28.6 ± 6.8 and 33.2 ± 4.3 [im/ 
min, respectively. Interestingly, the recoil was unidirectional to- 
ward the cell body when the ablation was performed within the 
lamella (Figures 4A and 4B; Movie S4), which may be explained 
by a bias of myosin contraction toward the rear of the lamella 
(Svitkina et al., 1997; Yam et al., 2007). In contrast, cutting the 
region of lamella overlap across the actin fiber linking colliding 
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Figure 4. Lamellar Stresses Are Increased 
and Redistributed during CIL 

(A) Kymographs of lamellar recoil upon laser 
abscission of the actin network in freely moving 
and colliding cells. Dotted rectangle highlights the 
width of the ablation region. 

(B) Quantification of recoil rate over time and initial 
recoil rate upon laser abscission. Error bars 
represent SEM. **p < 0.01 . 

(C) Quantification of lamellae strain overtime upon 
laser abscission and modeled forces assuming 
that the actin network behaves elastically over 
short time scales. The elastic and dissipative 
mechanical properties in the lamellae are modeled 
by an exponential decay of the strain that is over- 
laid onto the constant retrograde flow. Note that 
zero strain represents the end of the exponential 
decay. Assuming mechanical properties similar to 
previously published lamellae we can estimate the 
tension. Inset: Sketch illustrating the mechanical 
model of an elastic and dissipative element. The 
strain u is calculated by the ratio Al /I. Error bars 
represent SEM. **p < 0.01 . 

(D) Hemocyte velocities in freely moving and 
colliding cells 60 s after laser abscission with 
respect to the ablation site (red arrow). Magenta 
arrow is the average direction of the population. 
Note that after mock ablation there was a signifi- 
cant forward movement of cells, while ablation of 
the fiber during cell collision led to a significant 
rearward movement. *p < 0.05. 

(E) Localization of actin network stress during cell 
collision. Top panels: a time-lapse series of a he- 
mocyte containing labeled F-actin undergoing a 
collision (adapted from Figure 2A). Bottom panels: 
modeled intracellular actin stresses. Note that 
stresses were only measured for regions of the 
lamella that persisted for a 40-s period as defor- 
mation history is required in the analysis. Arrows 
highlight region of lamellae overlap. Dotted line 
highlights the redistribution of stresses around the 
cell body and asterisks the regions of high stress 
that colocalize with the actin fiber. 

(F) Kymograph of lamellar stresses over the region 
that colocalized with the actin fiber. Note the 
redistribution of stress from the back of the 
network to the front. 

(G) Kymograph of the instantaneous changes in 
actin flow direction in the region colocalizing with 
the actin fiber. 

(H) Quantification of the mean change in flow 
direction of the actin network in three regions 
corresponding to the back, middle, and front of the 
actin fiber. Note that the changes initially increase 
in the rear of the network. 

See also Movie S4. 



cells led to a significantly enhanced retraction rate of 65.8 ± 
11.1 fim/nnin suggesting increased tension was stored within 
the actin network during CIL (Figures 4A and 4B; Movie S4). 
We subsequently modeled the amount of force present within 
the actin network by assuming that the actin cytoskeleton be- 
haves elastically over short time scales (Gardel et al., 2004). 
Tracking edge displacement of the lamellae upon laser abscis- 



sion allowed us to measure the changes in strain of the lamellar 
network as it retracts. This revealed that laser abscission of the 
lamellae leads to an initial rapid exponential decay that is caused 
by the sudden release of lamellar tension, followed by a slower 
linear phase as the retrograde actin flow continuously pulls in 
the network (Figure 4C). Assuming a lamellar stiffness similar 
to growth cones, the measured strain upon the release of the 



366 Cell 161 , 361-373, April 9, 2015 ©2015 The Authors 











Cell 



lamellar tension allowed us to infer the forces present within the 
actin network in freely moving and colliding cells. This revealed 
that forces in colliding lamellae were ~3-fold higher than freely 
moving cells (Figure 4C). Furthermore, release of this tension 
by laser abscission during cell collision was sufficient to induce 
a rearward movement of the cell body away from the ablation 
site (Figure 4D; Movie S4). 

We also adapted previously developed techniques to infer the 
localization of stresses that drive the retrograde flow using 
an estimation of actin network deformation (Betz et al., 2011; 
Ji et al., 2008). As previous correlation of flow changes sug- 
gested that the colliding actin networks were becoming coupled 
(Figures 3G-3J), the model assumed that the colliding networks 
were behaving as a single visco-elastic material. Interestingly, 
this analysis revealed that there was a redistribution of stresses 
during the response. Prior to cell-cell contact, most of the stress 
was localized around the cell body similar to freely moving 
hemocytes. However, upon collision the stress redistributed 
from around the cell body to the base of the actin fiber (Figure 4E; 
Movie S4) where it subsequently propagated to more distal re- 
gions of the fiber (Figure 4F). After cell separation, the stress 
became localized again to the region around the cell body as 
in freely moving cells (Movie S4). This stress redistribution indi- 
cated that actin cytoskeletal changes were spreading, not from 
the site of lamellae contact, but from the rear of the network. 
Indeed, when we examined the distribution of instantaneous 
changes in the direction of actin flow, we observed a similar 
propagation from the cell rear (Figures 4G and 4H). These data 
confirm that lamellar tension increases during CIL and propa- 
gates from the rear of the network, suggesting that these cyto- 
skeletal changes are not initiated by a local signal released 
from the site of cell contact. 

Coupling of Colliding Actin Networks Leads to the 
Development of a Transient Stress Fiber 

As Myosin II is the major motor responsible for generating 
contraction within the actin cytoskeleton, we examined its dy- 
namics during hemocyte migration. In freely moving cells, both 
actin and Myosin II flow in a similar retrograde fashion (Figures 
5A, 5B, S3A, and S3B). However, comparable to our previously 
observed changes in actin flow direction during CIL (Movie 
S2), Myosin II flow reoriented perpendicularly toward the actin fi- 
ber (Figures 5C, 5D, and S3C; Movie S5). The actin fiber devel- 
oped coincidentally with Myosin II accumulation along its length 
(Figures 5E and 5F) and subsequently became decorated with 
repeating puncta of Myosin II ~1 .4 [im apart (Figure S3D), similar 
to stress fibers in vitro (Hotulainen and Lappalainen, 2006). 
Furthermore, the microtubule bundle aligned along the Myosin 
II decorated actin fiber (Movie S5). Analogous to the propagation 
of modeled stresses from the rear of the actin fiber. Myosin II in- 
tensity first increased at the back of the lamella (Figure 5G) and 
colocalized with the corridor of reduced retrograde flow (Fig- 
ure 5H). Upon lamellae retraction, the Myosin II puncta along 
the actin fiber rapidly moved in a retrograde fashion with the 
actin network toward the cell body (Movie S5). 

We also examined the localization of the formin. Diaphanous 
(Dia), which has previously been shown to decorate stress fibers 
within cells in vitro and be important for stress fiber assembly 



(Nakano et al., 1999; Sandbo et al., 2013). Expression of con- 
stitutively active Dia within hemocytes led to reduced cell 
spreading and an accumulation of Myosin within the lamella, 
with periods of severe contraction of the actin network suggest- 
ing that activation of Dia can increase lamellar tension (Fig- 
ure S3E; Movie S5). As Drosophila Dia is also known to be 
involved in filopodia formation (Homem and Peifer, 2009), it 
was unsurprising to observe wild-type Dia localized to filopodia 
in freely moving cells (Figures S3F and S3G). However, upon 
cell collision, Dia localized along the actin fiber (Figures 51 and 
5J). These data suggest that the previously described actin 
fiber is a stress fiber-like structure that couples colliding actin 
networks during CIL. 

Myosin 11-Dependent Contraction and Stress Fiber 
Formation Are Essential for Coordinating the CIL 
Response in Colliding Cells 

We subsequently examined the importance of lamellar contrac- 
tion and stress fiber formation in regulating the CIL response. We 
first analyzed actin flow in hemocytes of embryos mutant for 
myosin ii heavy chain (hereafter mentioned as Myosin II and 
called Zipper [Zip] in the fly). Drosophila zygotic mutant embryos 
(z/p^) only begin to show defects at late stages of embryogenesis 
(when we also begin to image hemocyte motility) suggesting that 
maternal levels of the protein perdure to this stage of develop- 
ment (Young et al., 1993). Mutant hemocytes were initially 
capable of migrating from the head where they originate, but 
began to fail in motility soon after this stage (Figure S4A). This re- 
veals that, in contrast to the reported dispensability of Myosin II 
in cell migration in 2D environments in vitro (Doyle et al., 2012), 
Myosin II is critical for hemocyte motility during embryogenesis. 
Analysis of retrograde flow in myosin ii mutants showed that they 
had a significant reduction in speed (1.5 + 1 .0 iim/min) (Figures 
S4B and S4C; Movie S6). Expression of a GFP-tagged Myosin 
II specifically in hemocytes within z/p^ mutants rescued their 
developmental dispersal (Figure S4A) and retrograde flow rates 
(Figures S4B and S4C), showing that Myosin II is critical for actin 
flow in vivo. These data suggest that myosin II mutant hemocytes 
have a reduction in the contractility of their actin networks. 

We subsequently examined the lamellar dynamics of myosin II 
mutant hemocytes during collisions. Time-lapse movies of z/p^ 
hemocyte collisions revealed that colliding cells failed to reorga- 
nize their actin networks (Figure 6A), or increase their lamellar 
retraction rates upon separation (Figures 6B and 6C; Movie 
S6). They also did not develop a prominent actin fiber during 
cell collision (Figure 6D). These defects were accompanied by 
a failure to cease their forward motion upon collision and an 
increased time in contact during the CIL response (Figures 6E 
and 6F). However, similar to fibroblasts in vitro (Giannone 
et al., 2007), loss of Myosin II in freely moving hemocytes also 
reduced their rate of lamellar retraction, showing that they had 
general defects in lamellar tension (Figure 6C). 

We therefore specifically analyzed the role of the stress fiber 
during CIL by examining the response in diaphanous (dia^) mu- 
tants. In contrast to z/p^ hemocytes, freely moving dia^ mutant 
cells showed no aberration in actin retrograde flow (Figures 
S5A-S5C; Movie S6), cell migration (Figures S5D and S5E), or 
rate of lamellar retraction (Figure 61). However, colliding cells 
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Figure 5. The Actin Fiber that Couples 
Colliding Cells Is a Stress Fiber-like 
Structure 

(A) Still image of a freely moving hemocyte con- 
taining labeled F-actin (magenta) and Myosin II 
(green). 

(B) Quantification of Myosin II tracks in freely 
moving cells. 

(C) Still image of a collision between hemocytes 
containing labeled actin and Myosin II. Note the 
puncta of Myosin II along the actin fiber (inset). 
Arrows highlight region of lamellae overlap. 

(D) Quantification of Myosin II tracks for 40 s upon 
lamellae overlap during CIL. 

(E) Kymograph of the region surrounding the actin 
fiber in (C) highlighting Myosin II accumulation 
during a collision. 

(F) Quantification of the increase in actin and 
Myosin II intensity in the region corresponding to 
the actin fiber relative to values prior to lamellae 
contact. Error bars represent SEM. 

(G) Quantification of Myosin II intensity in regions 
corresponding to the back versus the front of the 
actin fiber during CIL. 

(H) Analysis of actin flow dynamics in comparison 
with Myosin II localization (pseudocolored white). 
Note that actin network reorganization precedes 
Myosin II accumulation along the stress fiber. 

(I) Still image of a collision between hemocytes 
containing labeled actin (magenta) and Diapha- 
nous (green). Arrows highlight region of lamellae 
overlap. 

(J) Kymograph of the region surrounding the actin 
fiber in (I). 

See also Figure S3 and Movie S5. 
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showed a highly variable response (Movie S6) and, on average, 
failed to generate an actin fiber (Figure 6D) or coordinate their 
actin dynamics (Figures 6G and S5F-S5I). Additionally, their 
lamellar retraction rates upon collision were no different from 
freely moving cells (Figures 6H and 61). Furthermore, similar to 
z/p^ hemocytes, dia^ mutant cells showed a reduced capacity 
to cease their forward movement upon collision (Figure 6E) 



and persisted in contact for a longer dura- 
tion (Figure 6F). Despite these defects, 
dia^ hemocytes, similar to wild-type 
cells, formed a Zyxin puncta and showed 
some microtubule alignment during colli- 
sion (Figure S6). These data reveal that 
preventing stress fiber formation results 
in an aberrant response whereby cells 
eventually separate, but in the absence 
of the sudden lamellar retraction charac- 
teristic of type 1 CIL. 

A Coordinated CIL Response Is 
Essential for Hemocyte Dispersal 

As dia^ mutant hemocytes showed unco- 
ordinated cytoskeletal dynamics during 
collisions (Figures S5G and SSI), we 
wanted to determine whether this defect also led to an unco- 
ordinated kinematic response during CIL. Analysis of the 
acceleration time course revealed that, in contrast to the three 
acceleration changes observed in wild-type hemocytes (Fig- 
ure 1C), dia^ mutant cells only showed the back acceleration 
upon microtubule alignment (Figure 7A). However, this back 
acceleration was significant only when calculated at 60-s 
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Figure 6. A Stress Fiber-like Structure Is 
Required for a Normal CIL Response 

(A-C) Myosin II mutant (z/p^) collisions. (A) Top 
panels are still images from a time-lapse movie of 
hemocytes containing labeled F-actin during a 
collision. Bottom panels are heatmaps obtained 
from the pseudo-speckle analysis showing no 
substantial changes in retrograde flow. Arrows 
highlight region of lamellae overlap. (B) Kymograph 
of lamellar activity in colliding partners in a region 
perpendicular to the point of cell contact (red re- 
gions highlight lamellar retraction and blue exten- 
sion). (C) The speed of lamellar retraction in myosin 
II mutants was quantified at the time of separation 
to reveal that the retraction rate was no different 
to freely moving cells. Error bars represent SD. 
*p < 0.05. Note that control retraction rates are 
from Figure 1J. 

(D) Quantification of actin fiber formation in control, 
z/p^ and dia^ mutant hemocytes during CIL. The 
graph represents the relative increase in actin 
intensity within the region encompassing the actin 
fiber (red box in schematic) with respect to the 
surrounding regions of the actin network (blue 
boxes in schematic). 

(E) Quantification of the cessation of forward 
movement during CIL in which the mean distance 
between the initial point of contact and the 
nucleus was measured and compared to the dis- 
tance at the time of cell separation. This analysis 
revealed that the z/p^ and dia^ mutants failed 
to inhibit their forward motion in comparison to 
control cells. Error bars represent SD. *p < 0.05. 

(F) Graph of mean time of lamellae contact 
revealed that z/p^ and dia^ mutants maintained 
cell-cell contacts for a longer duration than control 
cells. Error bars represent S.D. **p < 0.01 . 
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intervals (Figure S7A) suggesting their response was not as 
tightly coordinated as controls (Figure S1A). Furthermore, after 
microtubule alignment the dia^ mutant cells showed no obvious 
movement away from their colliding partners (Figures 7B and 
S7B). We also quantified the velocities 240 s after microtubule 
alignment, which corresponds to the time when both control 
and dia^ mutant cells have separated (Figure 6F). This revealed 
that while control cells migrated away from the collision, dia^ mu- 
tants showed no significant directional preference with respect 
to their colliding partners (Figure 7C). These data suggest that 
dia^ mutant hemocytes fail to actively repel each other. 



(G-l) diaphanous mutant (dia^) collisions analyzed 
as in (A), (B), and (C). 

See also Figures S4, S5, and S6 and Movie S6. 



We next determined how the alteration 
in cell repulsion in dia^ mutant hemocytes 
affected their ability to form an evenly 
spaced pattern during their develop- 
mental dispersal. Analysis of the average 
regions occupied by hemocytes during 
their dispersal revealed that control cells 
migrated within defined domains along 
the ventral surface of the embryo (Fig- 
ure 7D). In contrast, dia^ mutant hemocytes showed an aberrant 
domain pattern (Figure 7D) and cells patrolled for greater dis- 
tances within the embryo (Figures 7E and 7F), suggesting that 
they were less confined in their motility, dia^ mutant hemocytes 
also had a slight reduction in cell spacing as shown by their 
decreased nearest neighbor distances (median = 17.5 |im for 
controls and 15.8 iim for dia^, p < 0.05) (Figures S7C and S7D; 
Movie S7). This suggests that precise repulsion dynamics are 
helping confine the migration of cells to defined regions within 
the embryo. Previous mathematical modeling of hemocyte 
dispersal suggested that tightly controlled CIL behavior was 
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Figure 7. A Coordinated CIL Response Is Necessary Hemocyte Patterning 

(A) Time course of hemocyte accelerations in dia^ mutants (black arrows) surrounding a collision event with reference to the colliding partner (red arrow). All time 
points show random accelerations except the time of microtubule alignment. **p < 0.01 . 

(B) Quantification of average cell direction during the CIL time course as highlighted in the schematic. Blue highlights forward movement and red movement away 
from the colliding partner. Error bars represent SD. 

(C) Cell velocities at 240 s after microtubule alignment with respect to the colliding partner (red arrow). Magenta arrows are the resultant velocities. Note that only 
controls show a significant movement away from the colliding partner. *p < 0.05. 

(D) The average regions occupied by hemocytes during their developmental dispersal revealed a disruption in the even spacing in diaphanous mutants. 

(E) Tracks of hemocytes migrating over a 20-min period after they have spread throughout the embryo. 

(F) Quantification of the maximum distance hemocytes migrate from the tracks measured in (E) revealed that dia^ mutants migrate over greater distances in the 
embryo. ***p < 0.001. 

See also Figure S7 and Movie S7. 



essential for their normal dispersal (Davis et al., 2012). As hemo- 
cytes in dia^ mutants showed no directional preference with 
respect to their colliding partners upon cell separation (Fig- 
ure 7C), we wanted to determine how this reflected on the overall 
cell distribution in the simulation. Indeed, randomizing the 
sensitivity of simulated cells to the direction of their colliding 
partners (which in control simulations is a fixed parameter) led 
to a similar acquisition of aberrant domains (Figures S7E and 
S7F; Movie S7) and a slight reduction in cell spacing (median = 
24.2 |im for wild-type parameters and 23.3 iim for randomized 
repulsion, p < 0.001). These data show that a precisely orches- 
trated CIL response in hemocytes is essential for it to behave 
as an efficient patterning cue. 



DISCUSSION 

Here, we show that hemocyte dispersal requires a precisely 
orchestrated CIL response; cells are not stochastically repelling 
each other. Hemocyte collision and subsequent repulsion in- 
volves a stereotyped sequence of kinematic stages that are 
modulated by synchronized changes in actin and microtubule 
dynamics. These integrated cytoskeletal changes are regulated 
by a transient inter-cellular adhesion, which physically couples 
the actin networks of colliding cells and builds up lamellar 
tension. It is this inter-cellular actin-clutch and the mechanical 
response to the collision that allows for a precise orchestration 
of cellular behaviors during CIL. 
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In recent years, there has been speculation that clutch-like 
mechanisms also exist at cell-cell junctions (Giannone et al., 
2009). Indeed, engagement of cell-cell adhesion molecules 
in neuronal growth cones leads to similar slowing of actin 
retrograde flow (Schaefer et al., 2008). Furthermore, apical 
constriction during gastrulation, which is driven by acto-myosin 
contraction, is hypothesized to be induced by a “clutch-like” 
adhesion (Roh-Johnson et al., 2012). During CIL in hemocytes, 
this cell-cell adhesion is critical to orchestrate both intra- 
cellular responses and inter-cellular behaviors, which allows 
the response to be coordinated in colliding cells. As apical 
constriction during gastrulation is also coordinated in space 
and time across numerous cells within an epithelial sheet (Martin 
et al., 2010), it is possible that a similar clutch-like mechanism 
is allowing such inter-cellular integration of forces. 

The engagement of a transient inter-cellular adhesion is char- 
acteristic of CIL in a number of cell-types (Gloushankova et al., 
1998; Heaysman and Pegrum, 1973; Theveneau et al., 2010). 
This transient adhesion is very similar to the initial punctate adhe- 
sion between epithelial cells prior to their formation of a mature 
cell-cell junction, which also involves a radial actin bundle 
running perpendicular to the leading edge (Adams et al., 1998; 
Gloushankova et al., 1998). These transient cadherin-based 
adhesions, which have been called focal adherens junctions 
(Huveneers and de Rooij, 2013), depend on the development 
of tension (Huveneers et al., 2012). Indeed, cadherins in epithelial 
cells, astrocytes, and fibroblasts are observed to flow in a retro- 
grade fashion with the actin network (Peglion et al., 2014), which 
has led to speculation that cadherin-based cell-cell adhesions 
could lead to an analogous clutch-like mechanism during matu- 
ration of adherens junctions (Giannone et al., 2009). It will be 
interesting to investigate why epithelial cells transform a focal 
adherens junction into a stable cell adhesion, whereas in fibro- 
blasts and hemocytes the adhesion results in active repulsion. 

Subsequent to adhesion engagement during CIL, we observe 
a sudden and synchronous reorganization of the colliding actin 
networks. The engagement of this actin-clutch develops tension 
between colliding cells, which we hypothesize causes the sud- 
den forward acceleration that we observed immediately upon 
lamellar overlap. It is intriguing to note that chick heart fibroblasts 
similarly have a momentary acceleration toward each other dur- 
ing their CIL response (Abercrombie and Heaysman, 1953). This 
intracellular tension may also help form the transient stress fiber 
that couples the colliding cells, as stress fiber formation is a 
tension responsive process (Burridge and Wittchen, 2013). 
This creates a mechanism of haptic feedback whereby the cells 
“pull” on each other with the stress fiber acting as a mechano- 
sensor during collisions, similar to its hypothesized role in 
sensing substrate stiffness (Trichet et al., 2012). The contraction 
of this actin fiber, which must be embedded within the lamellar 
actin network, would also explain the network-wide reorganiza- 
tion of actin flows in a process analogous to the “contractile 
treadmilling” observed toward regions of actomyosin contrac- 
tion in fibroblasts (Rossier et al., 2010). 

As the actin networks reorganize in colliding hemocytes, 
microtubules polymerize into the region of low retrograde flow 
(that also correlates with the region of stress fiber development). 
Microtubules polymerize toward the leading edge in a number of 



cell-types and undergo frequent catastrophes as they fight 
against the flowing actin network (Waterman-Storer and Salmon, 
1997). In growth cones, upon adhesion engagement, the actin 
retrograde flow is slowed, allowing microtubules to polymerize 
toward the contact site (Schaefer et al., 2008) analogous to 
the initial stages of hemocyte CIL. We therefore speculate that 
the microtubule alignment observed between colliding cells dur- 
ing CIL is the result of microtubules following a path of least 
resistance within the actin network. However, it is also possible 
that microtubules are coupled to the stress fiber through an 
actin-microtubule crosslinker. Nevertheless, there must be tight 
coordination of both actin and microtubules during the CIL 
process. 

While it is clear that microtubules are critical for CIL in a num- 
ber of cell-types (Kadir et al., 2011; Stramer et al., 2010), their 
exact role during the response is currently unknown. The syn- 
chronous back acceleration in colliding partners is strongly 
correlated with the time when microtubule alignment occurs, 
suggesting that microtubule bundles are playing a role in stop- 
ping forward movement. The microtubules also appear to be 
critical for the generation of the precise internuclear spacing of 
the cells during CIL— that is critical for the emergence of the 
even dispersal pattern (Davis et al., 2012). Part of this spacing 
is governed by the regular size of hemocyte cell bodies; we hy- 
pothesize that the remainder of the nuclear spacing is controlled 
by the physical and dynamic properties of the microtubules 
themselves. As microtubules are rigid and capable of bearing 
compressive loads, the internuclear spacing may be controlled 
by a combination of tensional elements (i.e., the acto-myosin 
network) and elements that resist compression (i.e., microtu- 
bules). It may be the sum of these mechanical components— in 
a process analogous to cellular tensegrity (Ingber, 2003)— that 
allows the cells to precisely define their separation distance. 
Another possible role for microtubules may be in regulating 
the cell-cell adhesion. Microtubules target cell-matrix adhesions 
to promote their dissolution (Kaverina et al., 1999), and an 
analogous process may be occurring at the cell-cell adhesion 
during CIL. 

Finally, the retraction phase of CIL involves a seemingly syn- 
chronous lamellar response in colliding partners. It is possible 
that the tension generated by engagement of the actin-clutch 
is suddenly released due to a breaking of the cell-cell adhesion. 
Alternatively, the tension may cause cytoskeletal components, 
such as the microtubule bundle or the actin stress fiber, to un- 
dergo a sudden catastrophe. Either way, it is this sudden release 
of lamellar tension that causes the synchronous retraction of 
colliding partners, which is essential to generate a choreo- 
graphed CIL response. Lamellar retraction occurs prior to the 
generation of new protrusions away from the contact site, and 
we hypothesize that it is this sudden contraction that drives the 
subsequent repolarization phase of the response. Indeed, 
acto-myosin contractility initiates symmetry breaking and polar- 
ization in a number of cell-types (Cramer, 201 0; Yam et al., 2007). 
It is also interesting to note that during initiation of polarized cell 
motility there is a propagation of actin network changes from the 
lamellar rear to the front (Yam et al., 2007), which we also 
observed during CIL initiation, although the mechanics behind 
this redistribution are currently unclear. 
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Here, we reveal that hemocyte CIL involves distinct stages, 
leading to both cells retracting from one another and subse- 
quently repolarizing. It should be made clear that not all contact 
inhibitory cell-types undergo a similar sequence of events as CIL 
is not a homogenous response (Stramer et al., 2013). We favor 
the idea of broadly separating CIL into two types, first envisaged 
by Abercrombie (1970) and Vesely and Weiss (1973): type 1, 
which involves active retraction (e.g., hemocyte and fibroblast 
CIL), and type 2 in which forward movement is randomly de- 
flected or stopped altogether (e.g., dia^ mutant hemocytes, 
epithelial wound closure). While inhibition of cellular protrusions 
is sometimes thought to be the hallmark of CIL (Mayor and Car- 
mona-Fontaine, 2010), Abercrombie (1970) believed that the 
predominant response in colliding cells is “a spasm of contrac- 
tion” that “obliterates the process of ruffling.” It is also inter- 
esting to note the initial description of CIL by Abercrombie and 
Heaysman (1953): “As a result of the adhesion.... they [cells] 
push or pull against each other to some extent; some of the 
energy which would normally go into movement is thereby dissi- 
pated or becomes potential energy of elastic tension between 
the cells. When an adhesion breaks, the release of potential en- 
ergy stored as elastic tension produces the sudden accelera- 
tion.” An inter-cellular actin-clutch is an ideal candidate to be 
responsible for such a response. 

EXPERIMENTAL PROCEDURES 

Microscopy 

Embryos were mounted as previously described (Davis et al., 2012) and time- 
lapse images acquired every 5 s (for retrograde flow analysis) or 10 s (for kine- 
matic and co-localization analyses) with a PerkinElmer Ultraview spinning disk 
microscope during developmental dispersal (stages 14-16). See Extended 
Experimental Procedures for a list of fly lines and a more detailed description 
of microscopy. 

Kinematics Analysis and Modeiing 

For kinematic analysis, hemocytes were labeled with nuclear and microtubule 
markers and time-lapse movies were acquired at 10 s/frame. Nuclei were 
automatically tracked using Velocity software (PerkinElmer). Microtubule 
alignment was used as a marker for a CIL event, and cells that had not collided 
with another cell for 4 min before and after the microtubule alignment were 
included in the analysis. The velocity and acceleration of cells was calculated 
as previously described (Dunn and Paddock, 1982). See Extended Experi- 
mental Procedures for a more detailed description of kinematics. 

Retrograde Flow Analysis 

Time-lapse images of freely moving or colliding hemocytes containing actin 
labeled with either Lifeact-GFP or Moesin-cherry were acquired at 5 s/frame 
(when imaging actin alone) or at 10 s/frame (when imaging actin with other 
fluorescent probes). For collision analysis, cells were chosen such that the 
cells collided once over the duration of the time course. Cells were manually 
segmented prior to analysis. To quantify retrograde flow rates in lamellae, 
the cell body was manually segmented and data points within this region 
discarded. Pseudo-speckle analysis was performed as described previously 
(Betz et al., 2009). See Extended Experimental Procedures for a more detailed 
description of retrograde flow analysis. 

Laser Abscission 

Hemocytes were labeled with UAS-LifeAct-GFP and UAS-RedStinger and 
imaged on an inverted 780 Zeiss LSM multi-photon confocal with a time inter- 
val of 1 s. Cells were imaged for 5-10 s and then ablated with a two-photon 
laser tuned to 730 nm and focused in a 0.4 ^im x 1.5 ^im region, either at 
the edge or within the actin network for freely moving cells or at the region 



of lamella overlap along the actin fiber for colliding cells. Hemocytes were 
then imaged for 60 s with only 1.1 s elapsing between frames surrounding 
the ablation. For mock ablation of colliding cells, the same protocol was 
performed as mentioned except the laser was switched off. See Extended 
Experimental Procedures for a more detailed description of modeling lamellar 
forces. 

Modeling Cytoskeletal Stresses 

To compute the stresses inside the actin cytoskeleton in both freely moving 
and colliding hemocytes, the actin network was assumed to behave as a linear 
viscoelastic material. This work used the same model as Betz et al. (2011) 
to calculate the cytoskeletal forces developed by growth cones in vitro. 
See Extended Experimental Procedures for a more detailed description of 
modeling cytoskeletal stresses. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, 
seven figures, and seven movies and can be found with this article online at 
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SUMMARY 

Cell movement has essential functions in develop- 
ment, immunity, and cancer. Various cell migration 
patterns have been reported, but no general rule 
has emerged so far. Here, we show on the basis of 
experimental data in vitro and in vivo that cell persis- 
tence, which quantifies the straightness of trajec- 
tories, is robustly coupled to cell migration speed. 
We suggest that this universal coupling constitutes 
a generic law of cell migration, which originates in 
the advection of polarity cues by an actin cytoskel- 
eton undergoing flows at the cellular scale. Our anal- 
ysis relies on a theoretical model that we validate by 
measuring the persistence of cells upon modulation 
of actin flow speeds and upon optogenetic manipula- 
tion of the binding of an actin regulator to actin fila- 
ments. Beyond the quantitative prediction of the 
coupling, the model yields a generic phase diagram 
of cellular trajectories, which recapitulates the full 
range of observed migration patterns. 

INTRODUCTION 

Eukaryotic cell migration is essential for a large set of biological 
processes. Assessing quantitatively the exploratory efficiency of 
cell trajectories is therefore crucial. In the absence of external 
guidance, cell movement can be described as a random motion, 
and proposed models have ranged from simple Brownian mo- 
tion to persistent random walks (Selmeczi et al., 2008), Levy 
walks (Harris etal., 2012), or composite processes such as inter- 
mittent random walks (Benichou et al., 2011). Such models differ 
in the cell persistence, which quantifies the ability of a cell to 
maintain its direction of motion. The variety of behaviors, 
observed even along a single cell trajectory, stems from the 



fact that, as opposed to a passive tracer in a medium at thermal 
equilibrium, which performs a classical Brownian motion, a cell is 
self-propelled, and as such, belongs to the class of active Brow- 
nian particles (Romanczuk et al., 2012). This class of processes 
is extremely vast and needs to be restricted to have some pre- 
dicting power. Up to now universal behaviors have emerged in 
the context of the collective dynamics of self-propelled particles 
(Czirok et al., 1998; Toner et al., 2005; Vedel et al., 2013), but 
remain elusive at the level of single cells. 

Recently, a vast amount of data of individual cell trajectories 
has been collected over many cell types in the context of the First 
World Cell Race (Maiuri et al., 2012) (Figures 1A and IB). These 
data show that, despite an apparent diversity at the level of the 
whole population analyzed, correlations between the mean 
linear instantaneous cell speed and the persistence time, defined 
as the mean time during which a cell maintains its direction of 
motion, exist. This general trend that faster cells migrate more 
straight than slower cells, which has since been observed by 
others (Wu et al., 2014), remains unexplained and suggests 
that robust mechanisms could constrain the possible character- 
istics of cell trajectories. 

In this work, we analyzed trajectories of cells migrating in 
various in vitro assays and in live tissues. We reveal on the basis 
of this extensive data set a universal coupling between cell 
speed and cell persistence (UCSP). To explain what appears 
as a universal law of cell migration, we developed a physical 
model relying on minimal hypothesis that shows that actin flows, 
which are the hallmark of motile cells (Theriot and Mitchison, 
1 991 ; Svitkina et al., 1 997; Wilson et al., 201 0), reinforce cell po- 
larity and consequently cell persistence. The model was vali- 
dated experimentally by two assays: one enabling a gradual 
modulation of actin flow speeds combined with pharmacological 
interference experiments, and a second, based on an engi- 
neered optogenetic molecular module, which allowed control- 
ling an actin polymerization regulator, Arpin. Finally, the 
model has the following merits: it (1) quantitatively predicts 
the observed exponential correlation between speed and 
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Figure 1. Correlation between Cell Persis- 
tence and Cell Speed 

(A) Population mean persistence time versus mean 
instantaneous speed. Data from the First World 
Cell Race, refers to the original article (Maiuri et al., 
2012) for the cell-type color code. Linear fit (solid 
line) and 0.95 confidence interval (gray). 

(B) Definition of mean persistence time and mean 
instantaneous speed in ID. Cell contour color 
shows the time progression. Scale bar, 50 |im. 
(C-F) Persistence time, binned for the corre- 
sponding instantaneous speed, versus instanta- 
neous speed. (C) RPE1 cells on micropatterned 
lines of 9-|im width coated with fibronectin. (D) 
BMDCs in fibronectin-treated channels with a 7 x 
5 |am^ square section. (E-F) The persistence time in 
2D is defined here as the time needed for a cell to 
change its original direction of motion by 90°. (E) 
Data of RPE1 cells on 2D surface uniformly treated 
with fibronectin. (F) BMDCs confined between two 
parallel, fibronectin-treated planes, 5 i^m apart 
from each other. 

(G) BMDCs embedded in bovine collagen gel and 
confined between two planes 5 ^im apart from 
each other. 

(H) Myeloid cells imaged in live Medaka fish. 

Red curves represent the exponential fit of the 
experimental data. Black dots and gray lines are 
mean and SE for the binned data on both axis. 
See also Figures SI and S2 and Movie SI . 
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persistence that characterizes the UCSP, (2) provides from min- 
imal microscopic hypothesis an explicit construction of single 
cells dynamics as active Brownian particles, and (3) yields a 
generic phase diagram of cellular trajectories, which recapitu- 
lates the full range of observed migration behaviors. 

RESULTS 

Cell Trajectory Analysis Reveals a Universal Coupling 
between Cell Speed and Persistence 

The First World Cell Race (Maiuri et al., 201 2), which gathered re- 
cordings of individual cell trajectories on 1 D adhesive tracks for 
54 different adherent cell types made available an unprece- 
dented amount of data. We performed a further analysis of this 
data and confirmed a clear positive correlation between the pop- 
ulation averaged mean linear instantaneous cell speed and the 
population averaged persistence time at the level of all cell types 
(Figure 1A), despite their variety. To assess the robustness of this 
observation and check its validity at the level of single cell trajec- 
tories, we performed new experiments on two representative 
examples of migrating cell with either a mesenchymal migration 
mode (hTERT-immortalized retinal pigment epithelial cell line 
[RPE1]) or an amoeboid migration mode (immature bone 
marrow-derived mice dendritic cells [BMDC]), in various geome- 
tries (Figures SI and S2; Movie SI): ID adhesive tracks (Fig- 
ure 1C), ID microchannels (Figure ID), 2D non-confined adhe- 
sive substrates (Figure IE) and 2D confined substrates 
(Figure 1 F). Very long cell trajectories were recorded (see Sup- 
plemental Information), allowing a clear assessment of individual 
cell persistence. In all cases, for which cell concentrations were 
low enough to treat cell trajectories as independent, we found a 
striking correlation between persistence time t and mean instan- 
taneous velocity v, which is well fitted by a simple exponential 
curve r=Ae^^ before eventually saturating at larger v (see Dis- 
cussion). We then checked whether this correlation was still valid 
when cells were migrating in more complex environments. We 
recorded BMDCs migrating in 3D collagen gels (Figure 1G) and 
myeloid cells migrating in live Medaka fish (Grabber et al., 
2007) (Figure 1 FI). In both cases, persistence and speed were still 
highly correlated. Overall, these results show that the UCSP is 
robust when tested in different migration environments and or- 
ganisms, and that the correlation, observed at the population 
level or at the level of individual cells exists all along a cell trajec- 
tory: a faster displacement is correlated with a straighter path. 
The analysis of this remarkably robust correlation that defines 
the UCSP is at the core of this work. 

Faster Actin Retrograde Flow Lengthens Cell 
Persistence Time 

To elucidate the origin of the UCSP, we reasoned that, as this 
law applies to all cell types we tested, including cells with very 
different modes of migration (mesenchymal and amoeboid 
cells), it had to rely on a very conserved aspect of cell locomo- 
tion. Even if many details might vary from one cell type to the 
other, the most conserved aspect of cell locomotion is the retro- 
grade translocation of actin filaments in the frame of reference of 
the cell, from the front to the rear of its locomotory parts. Actin 
flows can either exist over large portions of the cell (in particular 



in amoeboid cells) (Renkawitz et al., 2009), or in some cases be 
essentially limited to the protrusive parts (such as lamellipodia in 
mesenchymal cells) (Theriot and Mitchison, 1991 ; Svitkina et al., 
1 997; Wilson et al., 201 0). This retrograde movement is powered 
by the combined forces of actin polymerization at the leading 
edge and actomyosin contraction at the trailing edge and repre- 
sents the driving force for locomotion. Upon coupling to the envi- 
ronment either via transmembrane adhesion receptors of the 
integrin family or via friction, retrograde actin flow is turned into 
traction forces, which pull the cell forward while actin filaments 
slide to the back of the cell (Mogilner and Oster, 1996; Jurado 
et al., 2005; Hawkins et al., 2011). 

To test if actin flows are involved in the coupling between cell 
persistence and cell speed, we used mature bone marrow den- 
dritic cells (mBMDCs), for which it was already demonstrated 
that actin retrograde flow can be varied without affecting cell 
speed when substrate adhesion strength is modulated (Renka- 
witz et al., 2009). Indeed, the cell speed v and the velocity of 
the actin flow V (defined hereafter in the frame of reference of 
the moving cell) are usually linearly coupled according \ov = aV 
(Jurado et al., 2005), such linear approximation being valid at 
least in the lower range of velocities for each cell type. The coef- 
ficient a models the effective friction between actin filaments and 
the substrate, usually mediated by specific adhesion proteins 
(Mitchison and Kirschner, 1 988; Gardel et al., 201 0) and therefore 
depends on experimental conditions. In some cell types, varying 
the actin/substrate coupling parameterized by a induces an 
adaptive response, which allows a cell to keep the speed of loco- 
motion V relatively constant, despite different retrograde veloc- 
ities V of the loosely coupled actin network. Such adaptation 
has been extensively characterized in mBMDCs. When placed 
in confined environments, these cells can flexibly shift between 
integrin-independent and integrin-dependent force transduction 
as even in the absence of these adhesion receptors the cells are 
able to generate sufficient traction to migrate. However, deple- 
tion of integrins or their ligands and the associated drop in friction 
causes retrograde actin slippage, which is then compensated by 
up to 2-fold increase in actin polymerization speed (Renkawitz 
et al., 2009). This property of mBMDCs moving in confined envi- 
ronments allowed us to independently study the influence of 
retrograde actin flow and actual cell speed on migratory persis- 
tence. Surface adhesion was independently controlled by deple- 
tion of p2 integrins and their ligands (coating the surfaces with an 
inert PEG layer), while cell speed was gradually varied by temper- 
ature changes and by pharmacological inhibition of actomyosin 
contractility (blebbistatin). This allowed us to tune retrograde 
flows from 3 to 1 5 iim/min independently of cell speed (Figure 2A). 
Strikingly, we observed that the mean persistence time t 
measured for each experimental condition was strongly corre- 
lated with the mean speed V of the actin retrograde flow and 
was well-fitted by a simple exponential r=Ae^^ (Figure 2B; 
Movie S2), whereas no correlation between t and cell speed v 
was observed. Note, however, that within each experimental 
condition (a fixed) the linear scaling between v and V still held, 
so that the correlation between cell speed and cell persistence 
was preserved at the level of each experimental condition. These 
data strongly suggest that the observed UCSP originates from a 
coupling between cell persistence and the actin flow. 
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Faster Actin Retrograde Flow Enhances the Asymmetry 
of Polarity Cues and Stabilizes Cell Polarity 

A simple hypothesis to explain these effects of actin flows would 
be that, in polarized migrating cells, actin flows reinforce cell po- 
larity by enhancing the asymmetry of polarity cues, as was pro- 
posed for the establishment of polarity in early Caenorhabditis 
elegans embryos (Munro et al., 2004; Goehring et al., 2011) (Fig- 
ures 2C and 2D). Examples of such cues, which are likely to 
vary with cell type, could be molecules responsible for the 
generation of contractile stress, such as myosin motors or their 
activators, actin polymerization regulators, or could alternatively 
be involved in the regulation of microtubule dynamics (Zhang 
et al., 2014). We introduce a first element of physical modeling 
to test this hypothesis. We consider a migrating cell and denote 
by X the coordinate in the reference frame of the moving cell, 
defined by the instantaneous direction of migration and by L 
the cell extension along this axis (Figure 2C). We are interested 
in the dynamics of the concentration of a generic polarity cue 
c(x, f), which we assume here depends only on x. This cue 
is assumed to specify the rear part of the cell, so that 
c(x = 0, f)>c(x = L, t) for a cell moving in the + direction. The po- 
larity cue can either diffuse in the cytosol, with diffusion coeffi- 
cient D, or, depending on its affinity for actin, be advected by 
the cytoskeletal flow, whose velocity along x is denoted by -\/ 
(with \/>0) in the cell reference frame. We denote by /Con and 
/Coff the corresponding binding and unbinding rates of the cue 
to the actin cytoskeleton. In the limit of fast exchange (/Con,koff 
large), the dynamics of c(x, t) depends on advective transport 
and on diffusion as follows: 

dtc{x, t) - dx ^Vc{x, f)j =bdlc{x, t) + dxKc, (Equation 1) 

where V = Vkon / (kon + koii) and D = D/Coff/(/Con +koff) ; here Jc is a 
Gaussian white noise that encompasses the fluctuations of the 
flux of cue molecules. At steady state, assuming V constant 
and the conservation of the total amount of cues, the mean 
cue concentration profile is therefore given by: 

Cv{x) = Ce“'^V^, (Equation 2) 

where C is a normalization constant and the dependence on the 
retrograde flow is denoted in subscript. This simple argument, 
which does not take into account heterogeneities in actin 
flows predicts simple exponential concentration profiles, whose 
steepness is controlled by the speed of the effective retrograde 
flow V (Figure 2D). Non-uniform actin flow profiles, as observed 
for example in (Wilson et al., 2010; Ofer et al., 2011) would 
change the exponential shape of concentration profiles, but 
would leave the dependence on the actin flow speed quali- 
tatively unchanged. To check this general prediction, we 
measured on motile mBMDCs the concentration profiles along 
the polarity axis for various molecules with different affinity for 
actin (Figure 2E; Movie S3): Tamra (that unspecifically labels 
cytoplasmic proteins), Lifeact-GFP (low affinity) (RiedI et al., 
2008), MLC-GFP (high affinity) (Kondo et al., 2011), and Utro- 
phin-GFP (high affinity) (Burkel et al., 2007). As expected from 
the model, we observed that increasing the actin retrograde 



flow could significantly increase the slope of the concentration 
profile of strong actin binders (MLC, Utrophin), whereas the pro- 
files of molecules with low (Lifeact) or no (Tamra) affinity to actin 
remained unchanged (Figures 2E and 2F). This validates our hy- 
pothesis that actin flows reinforce the asymmetry of concentra- 
tion profiles of actin binding molecules. 

We next reasoned that such mechanism in principle applies to 
any diffusing molecule that interacts with actin, and in particular, 
to polarity cues. It is then expected that increasing actin flows 
should increase the asymmetry of the concentration profile of 
any polarity cue, thereby stabilizing cell polarization and conse- 
quently increasing cell persistence, according to a mechanism 
discussed in Svitkina et al. (1997) (note that the connection to 
persistence was, however, absent in this reference). To further 
support this hypothesis, we measured the number of protrusions 
for each cell, using the same nine experimental conditions to 
vary the actin flow and found that in conditions in which cells 
had a faster actin flow, there was a larger proportion of cells 
with a single well-polarized lamellipodium (Figures 3A, 3B, and 
S3). We then measured the life time of such unipolar configura- 
tions, that we call polarization time Tp, and the actin retrograde 
flow speed in each protrusion, and found that they showed the 
same exponential correlation (Figure 3C). This suggests that 
the observed coupling between actin retrograde flow and cell 
persistence originates from the stabilization of polarity by faster 
actin flows, by favoring a unipolar configuration of the cell and by 
lengthening the protrusion lifetime. 

While in principle this mechanism can apply to any diffusing 
polarity cue that interacts with actin, species involved in the 
regulation of actin flow are the most susceptible to mediate the 
UCSP. Cdc42, a member of the Rho GTPase family is a central 
regulator of polarity, mediating its function via the actin and 
microtubule cytoskeleton (Fleasman and Ridley, 2008), and 
thus appeared as a natural candidate. Loss of Cdc42 in dendritic 
cells severely affects cell polarization and migration efficiency 
in vitro and in vivo (Lammermann et al., 2009). To test if the 
Cdc42 polarity module was involved in coupling rearward 
cortical flows to polarity we selectively inhibited Cdc42 using 
ML-141 (Hong et al., 2013) and measured the lifetime of cell po- 
larization Tp in dependence of cortical flow speed V that was 
modulated via a set of five independent experimental conditions 
as discussed above (Figure 2A). As expected, inhibition of Cdc42 
led to an overall reduction of the polarization lifetime for all 
conditions but, importantly, longer polarization lifetimes were 
observed for increasing actin flow speeds V and the UCSP 
was preserved (Figure 3D). These results indicate that the 
UCSP can function independently of a Cdc42-mediated 
signaling pathway. 

We next sought to unravel key molecular polarity cues closely 
affected by the cortical network flow. Myosin-ll is a prominent 
candidate, which strongly binds to actin; it is thus transported 
with actin flows and tends to accumulate at the cell back when 
actin flows are strong (Poincloux et al., 2011; Hawkins et al., 
2011). It has previously been implicated in actomyosin network 
organization, polarized cortical architecture and migration effi- 
ciency (Vicente-Manzanares et al., 2009). Indeed, we observed 
that myosin II light chain localization was strongly influenced 
by cortical flow modulations with a more rearward accumulation 



Cell 161, 374-386, April 9, 2015 ©2015 Elsevier Inc. 377 




Cell 



Actin Flow Modulation 




Actin retrograde flow V [)im/min] 



B 





E 



MLC-GFP Wt 




F 





Normalized position x 



■| 



Tamra Qrl lifeact-GFP MLC-GFP Utrophin-GFP 







Figure 2. Modulation of Actin Retrograde Flow Speed Reveals a Positive Feedback Loop on the Stability of Cell Polarity 

(A) Retrograde flow speed \/ in the reference frame of the cell in nine different conditions. Bars represent SEM. 

(B) Cell persistence time versus retrograde flow speed. Bars represent SEM, and the gray line represents the exponential fit. Inset shows mean-square 
displacement plot for ltg“^“ cells at 37°C. Arrow highlights the crossover from persistent to random motility on longer time scales. 

(C) Schematic illustrating the model with a minimal set of kinetic parameters. Polarity factors are shown as red dots, actin filaments are in blue while cell outline is 
green and migration substrate is gray. 

(D) Schematic showing the principle of protein redistribution by the actin retrograde flow. Density of transported protein is shown in red, and depends on the 
position X along the cell polarity axis. 

(E) Left: fluorescence images of MLC-GFP localization in migrating wild-type (Wt) and (32 integrin knockout (ltg“^“) dendritic cells under 2D confinement. Note the 
enhanced depletion of MLC from the leading edge in the ltg“^“ cell. Dashed lines indicate cell borders. Scale bar in gm. Right: kymograph performed along yellow 
lines. White dashed lines indicate retrograde myosin flow in the lab reference frame. 

(legend continued on next page) 
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Figure 3. Stability of Cell Polarity as a Func- 
tion of Actin Flow Speed and Effect of 
Pharmacological Inhibition of Myosin II and 
Cdc42 in mBMDCs 

(A) Differential interference contrast (DIG) repre- 
sentative images of cells in the unpolarized states 
(no lamellipodium), polarized state (one lamellipo- 
dium), and with two lamellipodia (bi-polar) and 
three or more lamellipodia (multi-polar). Arrows 
denote the axis of polarization. Scale bar, 20 lam. 

(B) Histogram of observed cell morphologies (color 
code see A) for nine different conditions (see 
Figure 2). 

(C) Polarization lifetime as a function of actin flow 
speed. Gray line indicates exponential fits to the 
data. Inset shows the exponential distribution of 
polarization lifetimes for Wt dendritic cells at 37°C. 
Bars represent SEM. 

(D) Polarization lifetime for control conditions (dark 
gray), ML-1 41 -treated cells (Cdc42 inhibitor, blue), 
and Blebbistatin-treated cells (Myosin II inhibitor, 
red). Actin retrograde flow speed \/ is modulated 
(from left to right) using wild-type (Wt) dendritic 
cells (DCs) at 30°C, Wt DCs at 37°C, |32 integrin 
knock-out (ltg“^“) DCs at 30°C, ltg“^“ DCs at 40°C, 
and ltg“^“ DCs at 41 .5°C. Bars represent SEM. 
See also Figure S3 and Movies S2 and S3. 



for higher actin flow speeds \/ (Figure 2F). To test a potential role 
of myosin-ll in the UCSP we pharmacologically interfered with its 
activity by using the myosin-ll inhibitor Blebbistatin. Inhibition of 
Myosin-ll induced a reduction of cell polarization lifetimes for all 
conditions and polarization lifetimes Tp were no longer correlated 
to cortical flow speed \/ but remained at constantly low values for 
all \f (Figure 3D). These data suggest that myosin-ll constitutes 
an indispensable polarity cue in mBMDCs that is involved in 
the establishment of the UCSP. 

Physical Modeling Predicts the UCSP 

To substantiate our hypothesis that the UCSP originates from 
the coupled dynamics of actin flow and diffusing polarity cues, 
we developed a minimal theoretical model, which assumes 
that the actin flow \f is also subject to fluctuations, due for 
example to the stochasticity of polymerization/depolymerization 
processes or the heterogeneity of the environment. The model, 
whose main ingredients are summarized below (see Supple- 
mental Information for details), relies on the key assumption 
that the mean value \/* of the actin flow (for a fixed cue concen- 
tration profile) is governed by the asymmetry of the cue concen- 
tration profile. More precisely, we assume that 

V* = l3{c*{0,t) -c*{L,t)), (Equation 3) 



where p is an effective parameter that controls the intensity of the 
coupling between the actin flow and the asymmetry of the cue 
concentration profile and can be interpreted as the maximal 
possible velocity of the actin flow. Here, c* (x, t) denotes the frac- 
tion of activated cues, i.e., cues that induce actin flow. We show 
in the Supplemental Information that the phenomenological 
coupling (Equation 3) covers the cases where actin flows are 
generated by asymmetric distributions of either actin polymeri- 
zation regulators (such as Arpin, see the discussion below) (Ju- 
licher et al., 2007) or activators of contractility (such as Myosin 
II or a Myosin II activator, as discussed above) (Hawkins et al., 
201 1 ; Bois et al., 201 1 ; Callan-Jones and Voituriez, 2013), which 
are the two main scenarios that we propose. We here do not aim 
at describing in details the biochemical steps involved in the 
process and assume a classical Hill response function of index 
n (results are qualitatively unchanged for other choices): 

C* {x,t) = (Equation 4) 

' ’ ' C^ + c"(x,f) ' ^ ' 

Here, Cg is the concentration of cues above which activation is 
saturated and is therefore determined by the maximal concen- 
tration of activated cues. 



(F) Normalized concentration profiles as a function of the normalized position x along the cell polarity axis (cell front is set at x =1) for a set of actin-binding 
molecules with various actin affinity. Concentration profiles are color coded for different cell-substrate adhesion strengths (green, Wt; magenta, Wt on Peg; 
brown, ltg“^“, all 37°C). Arrows indicate a shift of the polarity cue concentration profile toward the cell rear with increasing actin retrograde flow, (right) Normalized 
distance X50 at which the concentration drops to C50 = Cq /2 (***p < 0.001 , n.s. not significant). 

See also Movies S2 and S3. 
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Next, we make use of the fact that the typical diffusion time of 
cues over the cell length L^/D (of the order of seconds) is signif- 
icantly shorter than the characteristic timescale of fluctuations in 
the actin flow (of the order of minutes). The concentration 
of cues is then taken at steady state, and Equation 1 implies 
that it is a Poisson random variable, i.e., c(x, t) = Cv{x) + 6c with 
8c^ =KcCv{x), where Kc is a constant that controls the intensity 
of the particle number fluctuations and Cv{x) is the steady-state 
profile defined in Equation 2. Considering a cell moving on a 2D 
substrate (the 1 D case is then deduced by taking a vanishing 
angular diffusion, and the generalization to 3D migration is 
straightforward, see Supplemental Information), the dynamics 
of the actin flow velocity V, which is a vector of modulus V and 
polar angle c|), can then be written after linearization with respect 
to 6c (see Supplemental Information for details): 



Figure 4. Model Predictions: The UCSP Law 
and Phase Diagram of Cell Trajectories 

The values of the parameters used in the figure 
(except for 13, Cs) and the fitting procedure are 
described in the Supplemental Information. 

(A) Persistence time as a function of the normalized 
mean velocity V/ISq, where I3q=D/L for the pre- 
diction of the model and for all available experi- 
mental data (rescaled). 

(B) Phase diagram of cell trajectories in the p 
(maximal actin flow speed) and Cs (maximal con- 
centration of activated cues) plane. Symbols 
correspond to the parameters used in (C) and (F) 
(x), (D) and (G) (0), and (E) and (H) (+)■ 

(C-E) Effective potential 1/1/ as a function of the 
velocities 14 and \/y normalized by ISq. The singu- 
larity in 1/1/ at the origin l/= 0 is due to the anisotropy 
of the diffusion process (Romanczuk et al., 2012). 

(C) W is locally quadratic for 1/ small: diffusive 
phase (marked by x in B). (D) 1/1/ has a “sombrero” 
shape: persistent phase (marked by 0 in B). (E) 1/1/ 
has a mixed shape: intermittent phase (marked 
by + in B). 

(F-H) Examples of simulated trajectories in the 
diffusive (F), persistent (G), and intermittent phases 
(H). Colored crosses indicate the positions at 
regular time intervals. Circles in (F) indicate the cell 
size L. 

See also Figures S4 and S5. 



and K controls their amplitude, the effec- 
tive force T(V) and noise intensity a(V) 
are given explicitly in the Supplemental 
Information. The dynamics of \/, fully 
defined by Equation 5, is therefore the 
X [/im] dynamics of a Brownian particle in a force 

field T{V) in the presence of a non-trivial 
noise with additive and multiplicative 
parts. Assuming that, for any given experimental condition, 
the cell velocity is directly proportional to the actin flow 
velocity (v = aV, where we set hereafter a= - 1 for the sake of 
simplicity). Equation 5 makes it possible to fully characterize 
the resulting cell trajectories. In particular, the steady-state 
distribution P{v) of the velocity can be obtained as (see Supple- 
mental Information): 



P{V)-. 



N 



exp 2y / du 









Hu)^ 



=Ne (Equation 6) 



d,V-- 



yJ^{V) + a{V%v 



(Equation 5) 



where ’C.y and are Gaussian white noises of variance unity. 
Here, is the typical timescale of the actin flow fluctuations 



where W{v) is an effective potential (Figures 4C-4E) and N 
a normalization constant, and the persistence time can be 
deduced from the analysis of the autocorrelation function. In 
turn, the polarization time Tp (defined as the mean lifetime of a 
cellular configuration. Figure 3C) can be analytically obtained 
as the mean first-passage time (Condamin et al., 2007) at v = 0 
(see Supplemental Information). 

The analysis of the model reveals that the polarization time 
Tp(v) (obtained analytically) and persistence time r(v) (obtained 
from Monte Carlo simulations of Equation 5) can be very well 
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approximated by a simple exponential r, Tp —Ae^^ in dimensions 
1 , 2, and 3 for a wide range of parameters as soon as the Hill in- 
dex satisfies 0.7 <n<^.5, which shows that no strong nonline- 
arity needs to be invoked to fit the data. Remarkably, this allowed 
us to fit all the available data (Figures 4A and S4; Table S1 ) with a 
single universal exponential master curve and therefore repro- 
duce quantitatively the UCSP law. This provides a first clear vali- 
dation of the model. 

Synthetic Engineering of a UCSP Module Using an 
Optogenetic Approach 

The mechanism underling our model is that the advection of a 
rear-associated polarity cue from the cell leading edge to the 
rear (of the cell or of a protrusion) by the actin retrograde flow 
would contribute to the concentration of the cue at the rear 
and depletion from the front. This would reinforce cell polarity 
(or increase the lifetime of the protrusion), thus increasing cell 
speed and persistence, generating a positive feedback loop. 
To prove it experimentally, we developed a synthetic system 
based on Arpin and on the CRY2/CIBN optogenetic tool (Fig- 
ure 5A). Arpin is a negative regulator of the Arp2/3 complex 
that is known to decrease lamellipodial protrusion and that 
does not directly bind to actin (Dang et al., 2013). CRY2 and 
CIBN are two independent proteins that rapidly form an hetero- 
dimer after blue light illumination (Kennedy et al., 2010). To 
directly test the central hypothesis of our model, we fused Arpin 
to CRY2 (and to mCherry to track Arpin localization), while two 
actin binding proteins that strongly differ in their affinity to actin, 
LifeAct and Utrophin-CH (Utr) (Figure 2F), were fused to CIBN 
(and to GFP). As expected, these fused proteins localized consti- 
tutively to actin filaments. While blue light illumination did not 
modify cytoplasmic localization of Arpin-CRY2 expressed alone, 
when it was co-expressed with LifeAct-CIBN or Utr-CIBN, blue 
light illumination induced its recruitment to actin filaments (Fig- 
ure 5B; Movie S4). To simplify the analysis, cells were plated 
on thin line patterns, thus imposing a single protrusion (Doyle 
et al., 2009) and only two possible directions of migration. 
Continuous whole cell blue light illumination generated concen- 
tration gradients of Arpin-CRY2 from front to back of the cell, re- 
flecting the binding of Arpin-CRY2 to actin and its advection with 
the retrograde flow (Figure 5C). Upon binding through Utrophin- 
CIBN the concentration profile rapidly changed to reach a new 
steady state within ~10 min (Figure 5D). Different steady-state 
slopes were generated based on the association time of actin- 
binding proteins to actin, faint for LifeAct-CIBN, but much stron- 
ger for Utrophin-CIBN (Figure 5E), as expected from Figure 2F. 
This artificial system thus proved well suited to test our model: 
a negative regulator of actin polymerization, Arpin, was advected 
back from the cell leading edge by the actin retrograde flow, and 
the advection was more efficient for a stronger binding to actin. 

We next assessed how this advection would affect cell migra- 
tion. As expected, Arpin-CRY2 overexpression alone strongly 
impaired both cell speed and persistence (Dang et al., 2013), 
and co-expression of LifeAct-CIBN, which produces a weak 
backward advection of Arpin, was not able to rescue this effect 
after blue-light illumination. Conversely, co-expression of Utro- 
phin-CIBN completely rescued cell speed and persistence time 
(Figures 5F and 5G), consistent with the different advection rates 



of the Utrophin and Lifeact probes. To rule out that this effect 
was due to more efficient global depletion of Arpin from the cyto- 
plasm upon binding to actin filaments via the Utrophin probe, we 
measured Arpin-CRY2 depletion from the cytoplasm after light- 
induced recruitment to the actin cytoskeleton via LifeAct-CIBN 
or Utrophin-CIBN. This showed that depletion was similar for 
both probes (Figure 5H). We could thus conclude that the rescue 
of cell speed and persistence observed with the Utrophin probe 
was specifically due to the advection of Arpin by the actin retro- 
grade flow. This set of experiments provided two important 
validations of our model: first it showed that the effect of actin 
retrograde flow on cell persistence, via stabilization of a protru- 
sion, which we observed in dendritic cells, could be reproduced 
in a completely different cell type, RPE1 cells. Second, it showed 
that direct engineering of an artificial system involving the 
minimal set of elements used in our model was enough to reca- 
pitulate the effects predicted by the model. 

Phase Diagram of Main Ceil Migration Patterns 

In addition to the prediction of the UCSP, our model provides 
through Equation 5 an explicit construction of a cell trajectory 
as that of an active Brownian particle (Romanczuk et al., 2012). 
While this concept has already proved to be useful to model 
phenomenologically cell trajectories (Selmeczi et al., 2008), so 
far no such bottom up approach was available. The analysis 
of Equation 5 yields a very rich phase diagram as a function of 
P and Cs, which predicts three main classes of trajectories, as 
detailed below (Figures 4B-4H and S5; see Supplemental Infor- 
mation for details). 

Brownian Trajectories 

For p smaller than a critical value (3c{Cs), the potential W{v) has a 
generic bowl shape centered at v = 0 and the process can be well 
approximated by a classical Ornstein Uhlenbeck process 
(Gardiner, 2004) (Figures 4B, 4C, and 4F). This regime of slow 
maximal actin flow is characterized by an autocorrelation of 
V that decays exponentially with a short characteristic time 
T ~ 1 /y, so that there is no stable polarized state. At time scales 
larger than 1 /y trajectories are then Brownian like. 

Persistent Trajectories 

For l3>(3c{Cs) and Cs larger than a critical value Cg(|5), W{v) has a 
“sombrero shape,” which is the hallmark of systems with broken 
symmetry (Figures 4B, 4D, and 4G). In this regime of fast 
maximal actin flow and large maximal concentration of activated 
cues, the modulus of the velocity v fluctuates around a non-zero 
average. Trajectories then correspond to a persistent random 
walk with long lived polarization time, which is exponentially 
larger than 1 /y. 
intermittent Trajectories 

For l3>(3c{Cs) and Cs<C^{l3), W{v) as a mixed shape, with both a 
local minimum around v = 0 and a secondary minimum for a non- 
zero value of V (Figures 4B, 4E, and 4H). This regime of fast 
maximal actin flow and small maximal concentration of activated 
cues leads to intermittent trajectories (Benichou et al., 2011), 
characterized by an alternation of Brownian and persistent 
phases. The stabilization of the Brownian phase (around v = 0) 
is due to the multiplicative noise term in Equation 5 (see Mallick 
and Marcq, 2004): the small maximal concentration of activated 
cues induces large fluctuations that are enhanced for large actin 
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Figure 5. Optically Triggered Advection of Arpin by the Actin Retrograde Flow 

(A) Schematic of the principle of the experiment: Arpin-CRY2 overexpression inhibits Arp2/3-dependent actin polymerization. Blue light illumination induces 
CIBN/CRY2 heterodimerization and the subsequent transient binding of Arpin-CRY2 to the co-expressed actin binding protein, LifeAct-CIBN (left) or Utr-CIBN 
(right). Binding to Utr-CIBN, but not binding to LifeAct-CIBN, induces depletion of Arpin from the leading edge. 

(B-H) RPE1 cells plated on 9-^im wide fibronectin-coated line micro-patterns, transiently transfected with three sets of constructs: Arpin-CRY2-mCherry alone, 
Arpin-CRY2-mCherry and LifeAct-CIBN(-GFP), or Arpin-CRY2-mCherry and Utr-CIBN(-GFP). (B) Confocal images of Arpin-CRY2-mcherry before (-) and after 
(+) blue light illumination in representative cells expressing the three sets of constructs. (C and D) Time-lapse images of the leading edge of the cells shown in (B). 
(D) Time evolution of Arpin-CRY2-mcherry fluorescence intensity profile after light-induced binding to Utr-CIBN in the leading edge of a representative cell. Time 
is color-coded. In (D) and (E), the normalized position x is defined along the cell polarity axis, where the cell front is set at x = 1 . (E) Average steady state and SE of 
normalized Arpin-CRY2-mcherry fluorescence intensity profiles after blue light illumination at the leading edge of moving cells transfected with the three sets of 
constructs. (F and G) Instantaneous speed (F) and persistence time (G) of non-transfected cells (Ctrl) or cells transfected with the three sets of constructs. In 
boxplots: middle bars are medians, the rectangles span from the first to the third quartiles and bars extent from ±1 .5*IQR. One-tailed t test. (H) Arpin-CRY2- 
mcherry depletion from the cytoplasm. Fluorescence was measured in the cytoplasm away from actin rich regions (red circle, top panel) for cells expressing the 
three sets of constructs (two-tailed t test, error bars are SD). ***p < 0.001 , n.s. not significant. 

See also Movie S4. 



flows. The resulting dependence on y of the noise intensity leads 
to an effective restoring force toward the unpolarized state v = 0. 

The analysis above shows in particular that the general 
coupling that we describe between actin flow and polarity cues 
is in principle sufficient to induce cell polarization (see the transi- 



tion from diffusive to persistent or intermittent trajectories for 
increasing p). We, however, do not claim that this is the only 
mechanism responsible for cell polarization. In fact, a preexisting 
polarization mechanism, for example of Turing type, can be 
included in the model (see Supplemental Information). In such 



382 Cell 161, 374-386, April 9, 2015 ©2015 Elsevier Inc. 










Cell 




Figure 6. Experimental Cell Trajectories 
Can Be Classified in the Three Classes Pre- 
dicted by the Model 

Experimental cell trajectories can be classified in 
the three classes predicted by the model: diffusive 
(A, D, and G), persistent (B, E, and H), and inter- 
mittent (C, F, and I). 

(A-C) Examples of BMDCs migration patterns of 
each type (temporal overlay of phase contrast 
images). Color code indicates the time course 
(total duration: (A) 276 min, (B) 72 min, and (C) 
141 min. Scale bar, 100 |am (insets: 25 |am). 

(D-F) Corresponding trajectories extracted from 
automated tracking of the nucleus. Circles indicate 
the confidence interval (3 /im). Blue stands for cell 
speed v>4;um.min“^ and red for v<4;um.min“^ . 
(G-l) Histograms of velocities extracted from the 
corresponding experimental tracks are in agree- 
ment with the distribution of velocities P{v) (solid 
black line) from the model with parameters deter- 
mined in the Supplemental Information and /3,Cs 
as indicated in Figure 4B: diffusive phase (+), 
persistent (*), and intermittent (x). 

See also Movie S5. 




a case, we found that the coupling to actin flows again results in 
an increased polarization lifetime and thus an increase in cell 
persistence. This shows the validity of the UCSP independently 
of the presence of a preexisting polarization mechanism. 

The three classes of trajectories predicted by the model have 
been reported repeatedly in the literature (Selmeczi et al., 2008; 
Vedel et al., 2013), which provides a further validation of the 
model. To test this prediction more quantitatively, we analyzed 
2D trajectories of BMDCs (Figures 6A-6F; Movie S5) obtained 
by automated tracking of cell nuclei. For each analyzed trajec- 
tory, the measured velocity distribution P(v) could be well fitted 
by the model by adjusting only p and Cs, while all other parame- 
ters values were kept as in Figure 4 (Figures 6G-6I). We found that 
indeed all trajectories could be classified according to the above 
3 classes predicted by the model (Brownian, persistent, intermit- 
tent) depending on the value of the parameters p and Cs only, in 
agreement with the predicted phase diagram (Figure 4B). 

DISCUSSION 

Maintenance of cell polarity in time determines long-term cell 
migration patterns. Different molecular regulators of cell polarity 



in motile cells have been identified in the 
past. However, mechanistic models with 
predictive capacity from molecular scales 
to global cell behavior (e.g., cell shape, 
speed, and persistence) remain sparse 
mainly due to a limitation of experimen- 
tally accessible parameters and the large 
number of involved components. In this 
work, we describe a positive feedback 
loop between actin flows and mainte- 
nance of cell polarity in motile cells. 
This positive feedback impacts on the 
long-term migration behavior of cells and results in a higher 
persistence for faster cells— the UCSP law. By combining exper- 
imental results with theoretical modeling we suggest that actin 
flows are involved in reinforcing an asymmetric molecular distri- 
bution during cell polarization and thus increasing polarization 
lifetime. The two main predictions of the model were validated 
experimentally: (1) we first showed that it quantitatively predicts 
the UCSP law, and (2) we next demonstrated that it reproduces 
the main migratory behaviors reported so far. In addition, we 
could engineer a synthetic module based on the main ingredi- 
ents of the model, to optically modulate the UCSP. We thus 
believe that our model provides an important step toward a 
comprehensive view on cell motility from microscopic to large- 
scale cell behavior. 

The persistence time depends exponentially on the mean 
instantaneous velocity— that characterizes the UCSP— (Figure 1) 
in the lower range of speeds for each cell type, and a saturation 
to a plateau is observed at larger speeds. Such saturation could 
have several causes. The model primarily predicts an exponen- 
tial dependence of the polarization time Tp on the actin flow 
speed V. Such dependence Tp{V) implies a similar exponential 
dependence r(v) between persistence time and cell speed 
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under two conditions. First, it requires that polarization time Tp 
and persistence time t can be identified. While this is clear in 
1D geometries, this does not always hold in dimensions 2 and 
3, where two effects compete to destroy persistence: depolari- 
zation, characterized by the timescale Tp, and angular diffusion, 
characterized by the timescale ~ (3^ /K. One therefore ex- 

pects that T ~ Tp only for Tp<T^, while t saturates for larger 
values of Tp. Second, it requires that cell speed and actin flow 
speed are linearly coupled (v = aV). As we argued, such linear 
dependence generally holds in the lower range of speeds for 
each cell type. Non-linear effects (that could be due for example 
to a motor-clutch mechanism) (Mitchison and Kirschner, 1988) 
are, however, expected at larger speeds and could result in 
the observed saturation of the persistence time. 

An important observation we made is that this law also applies 
at the subcellular scale, to individual cell protrusions. This has 
two consequences: first it explains why cells with various modes 
of migration follow the UCSP law, as it might apply to any loco- 
motory subpart of the cell and the nature of the protrusions does 
not matter. Second, it suggests that such coupling between 
actin filaments flow rate (or even flows of other cytoskeletal ele- 
ments) and lifetime of polarity might apply to other phenomena 
than cell migration, such as polarized secretion or growth, which 
also rely on actin polymerization. 

In this work, we have validated the law on cells of mammalian 
origin migrating in ID and 2D geometries, with or without 
confinement and in 3D collagen gels— the main in vitro migration 
assays. We also extended our finding to myeloid cells moving 
in live tissues in Medaka fish. The process we describe is so 
generic, that it is likely to apply to cells from other organisms. 
Indeed a very similar correlation between speed and persistence 
was reported from migrating amoeba (Miyoshi et al., 2003; Gole 
et al., 2011). This law a priori only applies to random migration 
and not to guided migration. It is, however, likely that the mech- 
anism that we describe helps reinforce a weak external guidance 
and it might thus be also important for guided migration. As a first 
modeling step in that direction, we showed that our model and 
the UCSP still hold if an independent polarization module exists. 

In this work, we did not aim at investigating in more details the 
molecules that might, for a given cell type, be responsible for 
coupling the flow to the polarity. We believe that, even in a given 
cell type, there might be several molecules that could contribute 
to the coupling. For example, one of the most obvious player 
would be Myosin II, as it fits the two requirements: it is transported 
by actin filaments and the steepest its gradient, the faster the 
flow. Such a mechanism inspired us in the choice of the feedback 
equation we used in the model (Equation 3). This hypothesis was 
validated in the case of BMDCs, which are typical amoeboid cells 
whose motility strongly relies on the activity of Myosin II. Never- 
theless, inhibiting Myosin II was also shown to induce a stronger 
Rac-dependent protrusive activity and thus a stronger polymeri- 
zation-based actin flow (Sanz-Moreno et al., 2008), which can 
also transport polarity cues and thus compensate for the loss of 
Myosin II, at least in protrusion-based migration in mesenchymal 
cells. Last, we stress that our analysis does not exclude polarity 
cues involved in the regulation of other key actors of cell polarity 
such as microtubules (Zhang et al., 201 4), as long as they interact 
at least indirectly with actin. 



One of the most promising aspect of the model we have pro- 
posed based on the UCSP, is that it can generate all the range of 
observed cell trajectories, with only two main parameters: the 
maximal actin flow velocity p and the maximal concentration of 
activated cues Cg. While the latter might be difficult to vary 
experimentally, we have shown that the former is in fact quite 
versatile (Figure 2A), which opens the way to the control of cell 
migration patterns. We could experimentally implement an opto- 
genetic synthetic module that performs that task by inducing 
binding of a negative regulator of actin polymerization (Arpin) 
to actin filaments. Upon blue light illumination, Arpin was 
coupled to Utrophin and thus depleted from the migration front 
by retrograde advection with the actin flow, which in turn stabi- 
lized the protrusion. It is possible that some cells regulate bind- 
ing of polarity factors to actin to tune their persistence. Making 
use of recent results that identified optimal search patterns for 
both persistent (Tejedor et al., 2012) and intermittent trajectories 
(Benichou et al., 2011), we anticipate that the search efficiency 
for a target (e.g., DCs searching for antigens in a tissue) (Heuze 
et al., 2013) could be optimized by tuning p and Cg. The model 
we propose thus provides a very generic ingredient of cell migra- 
tion that could be used as a basis to model any process in which 
individual cell trajectories matter, such as search processes 
by immune cells (Harris et al., 2012), neuronal cells migration, 
or invasion by cancer cells. 

EXPERIMENTAL PROCEDURES 

In brief (see the Extended Experimental Procedures for a detailed description 
of the methods), bone marrow-derived dendritic cells were generated from 
the bone marrow extracted from mice and cultured as previously described. 
Human retinal pigment epithelial (RPE1, Clontech) were grown in standard 
conditions. Medaka fish: (Oryzias latipes) stocks were maintained as previ- 
ously described. Itgb2“^“, Itgb2“^“ LifactGFP, and wild-type mice were kept 
on a C57BL/6 background and bred in a conventional animal facility at 1ST 
Austria according to local regulations. 

For 1 D migration assays, RPE1 cells were placed on line-shaped micro-pat- 
terns coated with fibronectin, and BMDCs into fibronectin-coated microchan- 
nels. For 2D migration, cells were either regularly plated on a fibronectin-coated 
surface, or plated between two surfaces separated by micro-fabricated 
spacers to introduce confinement. To obtain large cell tracking area, nine adja- 
cent fields of views were recorded and images stacks were stitched by a 
custom written software. For 3D migration, collagen gels were prepared using 
1.6 mg/ml of bovine collagen (PureCol, Advanced BioMatrix). BMDCs were 
embedded in polymerizing collagen and then immediately confined between 
two glass surfaces spaced 5 i^m apart. 

Medaka myeloid cells live imaging was performed on fish at 9-1 1 days post- 
fertilization mounted as previously reported. Under-agarose-assays were per- 
formed with cells inserted between an agar gel and a Petri dish, as previously 
described. Blebbistatin (Sigma) was used in a final concentration of 1 0-20 |xM. 
ML-141 (Sigma) was used at a final concentration of 20 laM. BMDCs and RPE1 
cells were transfected using commercial transfection kit (from Amaxa and 
Roche or Invitrogen, respectively) according to the manufacturers recommen- 
dations. Nuclei were stained with Hoechst. 

Time-lapse movies of cell nuclei in 1 D, 2D, 3D, and in vivo, were analyzed by 
a custom written program as previously described. Medaka myeloid cells and 
BMDCs under-agarose were manually tracked with Fiji. In ID, the persistent 
time is defined as the time a cell moves in the same direction, in 2D, 3D, and 
in vivo, as the time it takes for the cell to change its initial direction by 90°. 
The differential angle is the angle between two consecutive displacements 
of a cell. BMDCs under-agarose tracks were analyzed using in-house algo- 
rithms implemented in MATLAB (MathWorks). The mean-square displacement 
(MSD) was fitted according to the Furths formula in order to extract the 
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persistence time. Actin dynamics were anaiyzed by kymographs as previousiy 
described. Protein reiocaiization was quantified using Fiji. For the anaiysis of 
poiarization iifetimes migrating ceiis were imaged every 10 s and the number 
of protrusions were recorded and saved as vectors. 

RPE1 ceiis transfected whit Arpin-CRY2-mCherry were imaged with an in- 
verted spinning disk confocai microscope (Roper/Nikon) with 60 x magnifica- 
tion. mCherry channei was acquired every minute, then after 10 min, Arpin- 
CRY2-mCherry recruitment to F-actin (using actin-bound LifeAct-CiBN-GFP 
or Utrophin-CiBN-GFP) was induced by iiiuminating aiso in the GFP channei. 
For statisticai anaiysis of trajectories, ceiis were imaged at iower magnification 
(5x) with iiiumination in the GFP channei at each time point (time-iapse 5 min). 

Medaka fish (kindiy provided by J. Wittbrodt) were kept and treated in accor- 
dance with the German (Tierschutzgesatz) or itaiian (decree 116/92) nationai 
guideiines and experimentai procedures were approved by institutionai Animai 
Care and Use Committee. Mice were bred in conventionai animai faciiities at 
iST Austria or at the institut Curie according to iocai reguiations. 

SUPPLEMENTAL INFORMATION 

Suppiementai information inciudes Suppiementai Experimentai Procedures, 
Suppiementai Discussion, five movies, five figures, and one tabie and 
can be found with this articie oniine at http://dx.doi.Org/10.1016/j.ceii.2015. 
01.056. 
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SUMMARY 

Despite recent discoveries of genetic variants 
associated with autoimmunity and infection, genetic 
control of the human immune system during homeo- 
stasis is poorly understood. We undertook a compre- 
hensive immunophenotyping approach, analyzing 
78,000 immune traits in 669 female twins. From the 
top 151 heritable traits (up to 96% heritable), we 
used replicated GWAS to obtain 297 SNP associa- 
tions at 11 genetic loci, explaining up to 36% of the 
variation of 19 traits. We found multiple associations 
with canonical traits of all major immune cell subsets 
and uncovered insights into genetic control for 
regulatory T cells. This data set also revealed traits 
associated with loci known to confer autoimmune 
susceptibility, providing mechanistic hypotheses 
linking immune traits with the etiology of disease. 
Our data establish a bioresource that links genetic 
control elements associated with normal immune 
traits to common autoimmune and infectious dis- 
eases, providing a shortcut to identifying potential 
mechanisms of immune-related diseases. 

INTRODUCTION 

The immune system has evolved over millions of years into a 
remarkable defense mechanism with rapid and specific protec- 
tion of the host from major environmental threats and pathogens. 
Such pathogen encounters have contributed to a selection of 
immune genes at the population level that determine not only 
host-specific pathogen responses but also susceptibility to 
autoimmune disease and immunopathogenesis. Understanding 
how such genes interplay with the environment to determine im- 



mune protection and pathology is critical for unravelling the 
mechanisms of common autoimmune and infectious diseases 
and future development of vaccines and immunomodulatory 
therapies. 

Studies of rare disease established major genes, and their 
associated pathways, that regulate pathogen-specific immune 
responses (Casanova and Abel, 2004) and genome-wide associ- 
ation studies (GWAS) of autoimmune disease have also been 
productive for finding common variants (Cotsapas and Hafler, 
2013; Parkes et al.,2013; Rajet al.,2014). Despite this progress, 
there are still major limitations in our understanding of the ge- 
netics of complex autoimmune or infectious diseases. A key 
missing piece is the elucidation of the genes controlling critical 
components of a normal human immune system under homeo- 
static conditions. These include the relative frequencies of circu- 
lating immune cell subsets and the regulation of cell-surface 
expression of key proteins that we expect have strong regulatory 
mechanisms. 

Previous studies in humans and rodents have shown that vari- 
ation in the levels of circulating blood T cells is in part heritable 
(Amadori et al., 1995; Kraal et al., 1983). Identifying the underly- 
ing genetic elements would help us understand the mechanisms 
of homeostasis— and its dysregulation. Twin studies are ideal to 
quantify the heritability of immune traits in healthy humans by al- 
lowing adjustment for the influence of genes, early environment, 
age, and cohort, plus a number of known and unknown con- 
founders (van Dongen et al., 2012). Early studies from our group 
demonstrated genetic control of CD8 and CD4 T cell levels in 
twins (Ahmadi et al., 2001), and others have shown similar heri- 
table effects in non-twins and rodents and with broad white 
cell phenotypes (Amadori et al., 1995; Clement! et al., 1999; 
Damoiseaux et al., 1999; Evans et al., 1999; Ferreira et al., 
201 0; Hall et al., 2000; Kraal et al., 1 983; Nalls et al., 201 1 ; Okada 
et al., 2011). A recent study, with a family design, was the first to 
perform GWAS on a larger range of immune subtypes. The 
authors analyzed 272 correlated immune traits derived from 95 
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Figure 1. Schematic Representation of Leukocyte Populations Analyzed and Summary Manhattan Plot 

(A) This diagram iiiustrates the approach to anaiyzing the immunophenotyping data obtained by flow cytometry. It is not meant to convey differentiation stages of 
leukocyte populations, though that property is largely reflected in this diagram. Each “lineage” of a subset of leukocytes was identified through hierarchical 
gating. Within each of these lineages, all possible combinations of markers with heterogeneous expression within the lineage were analyzed. The number of 
subsets identified by this combinatorial approach is shown in various lineages; the trait analyzed was the CSF within its parent lineage. In addition, the cell SPEL 
was quantified by the median fluorescence intensity of the antibody staining on a given cell subset; the number of SPEL traits is indicated as well. 

(legend continued on next page) 
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cell types and described 23 independent genetic variants within 
13 independent loci (Orru et al., 2013). 

Here, we report a comprehensive and high-resolution deep 
immunophenotyping flow cytometry analysis in 669 female 
twins using 7 distinct 14-color immunophenotyping panels that 
captured nearly 80,000 cell types (comprising ~1 ,800 indepen- 
dent phenotypes) to analyze both immune cell subset fre- 
quency (CSF) and immune cell-surface protein expression levels 
(SPELs). This gave us a roughly 30-fold richer view of the healthy 
immune system than was previously achievable. Taking advan- 
tage of the twin model, we used a pre-specified analysis plan 
that prioritized 151 independent immune traits for genome- 
wide association analysis and replication. 

We find 241 genome-wide significant SNPs within 1 1 genetic 
loci, 9 of which are previously unreported. Importantly, they 
explain up to 36% of the variation of 19 immune traits (18 previ- 
ously unexplored). We identify pleiotropic “master” genetic loci 
controlling multiple immune traits and key immune traits under 
tight genetic control by multiple genetic loci. In addition, we 
show the importance of quantifying cell-surface antigen expres- 
sion rather than just cell-type frequency. 

Critically, we show overlap between these genetic associa- 
tions of normal immune homeostasis with previously established 
autoimmune and infectious disease associations. This rich data- 
base provides a vital, publicly accessible bioresource as a bridge 
between genetic and immune discoveries that will expedite 
the identification of disease mechanisms in autoimmunity and 
infection. 

RESULTS 

Subjects 

The discovery stage comprised 497 female participants from 
the UK Adult Twin Register (TwinsUK). There were 75 complete 
monozygotic (MZ) twin pairs, 170 dizygotic (DZ) pairs, and 7 sin- 
gletons (arising from quality control [QC] failures in one co-twin). 
The mean age was 61 .4 years (range: 40-77). The replication 
stage comprised a further 172 participants, mean age 58.2 
years (range: 32-83), with 46 MZ, 118 DZ, and 8 singletons. 
We stained cryopreserved peripheral blood mononuclear cells 
(PBMC) from each, using a set of 7 14-color immunophenotyp- 
ing panels that delineate a large range of immune subsets (Fig- 
ures 1A, SI, S2, and S3 and Table SI). Immune traits analyzed 
included the CSF (i.e., the proportionate representation of a 
given phenotype) and the SPEL (i.e., a quantitative measure of 
gene expression on a per-cell basis). The variability of all traits 
was assessed using longitudinal sampling on a small cohort of 
individuals as described in the Experimental Procedures; of 
the 50,000 traits meeting the first filter criterion (Figure S4), 
the mean covariance across samples drawn 6 months apart is 
0.86. All trait values and summary analyses, including variability, 
are available for download. Data and statistical analysis of the 
discovery stage was completed per a pre-defined statistical 



analysis plan before samples from the replication stage were 
thawed. 

GWAS analysis of all 78,000 immune traits is computationally 
prohibitive and would require a multiple comparisons correction 
that dramatically reduces sensitivity. The ability to infer heritabil- 
ity (proportion of variance explained solely by genetic factors) by 
the use of twins dramatically enhanced our ability to focus on 
those that are most likely to be informative. Co-variation of all 
traits was computed; about 1,800 were independent at r < 0.7 
(Figure S4). 

We found no significant association of the analyzed traits with 
self-reported tobacco use or alcohol consumption and so did not 
include those behaviors as covariates. We identified many traits 
associated with age and included age as a covariate in all ana- 
lyses. Notably, an advantage of using a twin-based cohort is to 
render age and other cohort effects minimally impactful. The 
age range of our cohort was optimal for our goal of identifying im- 
mune traits associated with genetic elements that show a risk for 
autoimmune diseases. Because incidence for such diseases 
often increases with age, the greatest power for such correla- 
tions will be obtained using samples measurements most prox- 
imal to the common onset of disease. 

Heritability 

Falconer’s traditional formula (twice the difference in intraclass 
correlations) was used to roughly estimate the heritabilities of 
all 78,000 immune traits; after ranking, traits were selected for 
further pre-specified analyses (Figure S4). Variance components 
analysis (additive genetics, common environment, and unique 
environment, or ACE model) was used to more precisely esti- 
mate heritabilities of chosen traits. The heritabilities ranged 
widely from 0%— suggesting purely environmental or stochastic 
influences— to 96% (e.g., CD32 expression on dendritic cells), 
indicating a strong genetic effect. Figure S5 shows the range 
of heritabilities for selected traits, and the components of the 
model are tabulated in Table S2 with full trait descriptions. 

GWAS of Immune Traits 

Single-variant associations were performed on 1 51 immune traits 
selected for high heritability or biological interest, comprising 
cell frequency (129 CSFs) and cell-surface protein expression 
(22 SPELs). Many significant associations were found despite 
the stringent Bonferroni multiple testing threshold of p < 3.3 x 
10“^°. We also performed a conditional analysis, including the 
top SNP of each locus as a covariate, to identify potential 
independent secondary signals. This analysis did not reveal any 
significant evidence for additional independent signals. 

Six SPELs were significant (Table 1), with the strongest be- 
tween MFI:516 (CD39 SPEL on CD4 T cells) and the ENTPD1 
(CD39 gene) SNP rs7096317 (p = 9.4 x lO”"^®). Many other var- 
iants of ENTPD1 were also associated with this trait (Table S3). 
Expression of five others (MFI:189, MFI:212, MFI:231, MFI:504, 
and MFI:552, which include CD27 expression on B and T cell 



(B and C) Summary Manhattan plots: green dots, genome-wide significant associations (p < 5 x 10“®). The red line indicates the significance 
threshold of p < 3.3 x 10“^°, which corresponds to the standard genome-wide threshold after further adjustment for 151 independent tests. 
The variants shown are MAP > 0.1; call rate > 0.9; HWE p value > 1 x 10“®. Shown are separate plots for SPEL associations (B) and CSF 
associations (C). 
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Table 1. Discovery and Replication Results for the Top Significant SNPs at Each Locus for Each Immune Trait 



Locus:Genes Trait ID 


Trait Phenotype 


Marker 


Chr 


EA/NEA 


EAF 


Beta (SE) 


p Value 


Beta (SE) 


p Value 


Beta (SE) 


p Value 


1 : FCGR2A, FCGR2B, FCRLA MFI:1 89 


CD27 on IgA-" B 


rs1801274 


1 


A/G 


0.49 


0.128 (0.02) 


6.48E-11 


0.07 (0.03) 


3.70E-02 


0.11 (0.02) 


2.8E-11 


1 : FCGR2A, FCGR2B, FCRLA MFI:212 


CD27 on IgQ-" B 


rs1801274 


1 


A/G 


0.49 


0.136 (0.02) 


5.38E-12 


0.12(0.03) 


1.1 IE-04 


0.13(0.02) 


2.9E-15 


1 : FCGR2A, FCGR2B, FCRLA MFI:231 


GDI 61 on CD4T 


rs1801274 


1 


A/G 


0.49 


0.131 (0.02) 


2.64E-11 


0.12(0.03) 


2.17E-04 


0.13(0.02) 


2.7E-14 


1 : FCGR2A, FCGR2B, FCRLA MFI:504 


CD27 on CD4 T 


rs1801274 


1 


A/G 


0.49 


0.145 (0.02) 


5.42E-14 


0.14(0.03) 


4.20E-05 


0.14(0.02) 


1. IE-17 


1 : FCGR2A, FCGR2B, FCRLA MFI:552 


CD27 on CD8 T 


rs1801274 


1 


/VG 


0.49 


0.186 (0.02) 


1.26E-21 


0.12(0.03) 


1.72E-04 


0.17(0.02) 


4.2E-24 


1 : FCGR2A, FCGR2B, FCRLA P7:1 1 0 


iMDC: %CD32-" 


rsl 0494359 


1 


C/G 


0.12 


0.343 (0.03) 


2.52E-29 


0.43 (0.05) 


1.05E-15 


0.36 (0.03) 


5.9E-43 


1 : FCGR2A, FCGR2B, FCRLA P7:224 


CD1c-"mDC: %CD32 


rs4657090 


1 


/VG 


0.27 


-0.174 (0.02) 


1.30E-14 


-0.19(0.04) 


3.86E-06 


-0.18(0.02) 


2.7E-19 


2: NFIA P4:3551 


NK: %CD314"CD158a^ 


rsl 2072379 


1 


G/C 


0.16 


-0.131 (0.02) 


1.73E-10 


-0.10(0.05) 


4.87E-02 


-0.13(0.02) 


2.7E-11 


3: NRXN1 P4:3551 


NK: %CD314-CD158a^ 


rsl 7040907 


2 


T/C 


0.07 


-0.208 (0.03) 


2.68E-10 


-0.16(0.08) 


4.22E-02 


-0.02 (0.03) 


3.9E-11 


4: PRKCI P4:3551 


NK: %CD314-CD158a^ 


rs2650220 


3 


G/A 


0.16 


-0.15(0.02) 


3.18E-10 


-0.10(0.05) 


4.55E-02 


-0.14(0.02) 


6.0E-11 


5:NT5E P2:4195 


CD4 T: %CD39-CD73^ 


rs9444346 


6 


G/A 


0.19 


-0.2 (0.03) 


1.18E-14 


-0.12(0.04) 


4.98E-03 


-0.18(0.02) 


8.8E-16 


RP11-30P6 P2:4204 


CD4 T: %CD73^ 


rs9444346 


6 


G/A 


0.19 


-0.195 (0.03) 


5.85E-14 


-0.12(0.04) 


2.82E-03 


-0.18(0.02) 


1.8E-15 


6:SLC18A1 P4:3551 


NK: %CD314"CD158a^ 


rsl 390942 


8 


T/C 


0.15 


-0.163 (0.02) 


1.39E-15 


-0.20 (0.05) 


1.70E-04 


-0.17(0.02) 


1.4E-18 


7:SLC25A16 P4:3551 


NK: %CD314“CD158a^ 


rs3017072 


10 


T/C 


0.15 


-0.153 (0.02) 


2.75E-13 


-0.15(0.05) 


2.08E-03 


-0.15(0.02) 


2.2E-15 


8: FAS, ACTA2 PI :6601 


CD8 T: %TSCM 


rs7097572 


10 


C/T 


0.48 


-0.168 (0.02) 


8.51E-16 


-0.18(0.03) 


2.72E-06 


-0.17(0.02) 


1.3E-20 


9-.ALDH18A1, ENTPD1 , ENTPD1-AS1, MFI:516 
RP11-7D5, SORBS1, TCTN3 


CD39 on CD4 T 


rs7096317 


10 


G/A 


0.42 


-0.255 (0.02) 


9.40E-40 


-0.30 (0.04) 


9.92E-17 


-0.27 (0.02) 


1.6E-54 


9: ALDH18A1, ENTPD1 , ENTPD1-AS1, P2:10491 
RP11-7D5, SORBS1, TCTN3 


CD8 T: %CD39^ 


rs4074424 


10 


G/C 


0.42 


-0.219(0.02) 


4.1 IE-27 


-0.19(0.04) 


2.40E-07 


-0.21 (0.02) 


8.2E-33 


9: ALDH18A1, ENTPD1 , ENTPD1-AS1, P2:3460 
RP11-7D5, SORBS1, TCTN3 


CD4 T:%CD39-"CD38^ 
PD1- 


rs4582902 


10 


C/T 


0.47 


-0.164 (0.02) 


4.55E-16 


-0.19(0.03) 


2.58E-08 


-0.17(0.02) 


9.2E-23 


9: ALDH18A1, ENTPD1 , ENTPD1-AS1, P2:4159 
RP11-7D5, SORBS1, TCTN3 


CD4 T:%CD39-"CD73- 


rs6584027 


10 


G/A 


0.47 


-0.212 (0.02) 


1.54E-25 


-0.20 (0.04) 


1.09E-08 


-0.21 (0.02) 


1. IE-32 


9: ALDH18A1, ENTPD1 , ENTPD1-AS1, P2:4186 
RP11-7D5, SORBS1, TCTN3 


CD4 T:%CD39-" 


rs6584027 


10 


G/A 


0.47 


-0.215(0.02) 


2.20E-26 


-0.21 (0.04) 


5.27E-09 


-0.21 (0.02) 


7.8E-34 


9: ALDH18A1, ENTPD1 , ENTPD1-AS1, P2:4213 
RP11-7D5, SORBS1, TCTN3 


CD4 T:%CD39-"CD73-" 


rsl 0882676 


10 


/VC 


0.47 


-0.195 (0.02) 


6.76E-22 


-0.19(0.04) 


2.19E-07 


-0.19(0.02) 


9.0E-28 


1 0: KLRC1, KLRC2, KLRC4, KLRK1 , P4:3551 

RP11-277P12 


NK: %CD314-CD158a^ 


rs2734565 


12 


C/T 


0.3 


-0.144 (0.02) 


1.34E-10 


-0.18(0.04) 


1.67E-05 


-0.15(0.02) 


1.4E-14 


1 0: KLRC1, KLRC2, KLRC4, KLRK1 , P4:4832 

RP11-277P12 


NK: %CD314-CCR7- 


rs2734565 


12 


C/T 


0.3 


-0.233 (0.02) 


1.27E-24 


-0.33 (0.04) 


3.95E-15 


-0.26 (0.02) 


2.7E-37 


1 0: KLRC1, KLRC2, KLRC4, KLRK1 , P4:5538 

RP11-277P12 


NK: %CD314-CD335-" 


rs2734565 


12 


C/T 


0.3 


-0.275 (0.02) 


3.40E-34 


-0.38 (0.04) 


2.27E-19 


-0.30 (0.02) 


6.4E-51 


1 1 : FTC P4:3551 


NK: %CD314"CD158a^ 


rsl 42031 8 


16 


/VG 


0.1 


-0.146 (0.02) 


9.34E-11 


-0.19(0.06) 


4.05E-02 


-0.14(0.02) 


1.2E-11 



For the Discovery stage, we used a significance threshold of p < 3.3 x 10“^°. This threshold corresponds to the standard genome-wide threshold of 5 x 10"® after further adjustment for 151 
independent tests. Orru et al. (2013) also identified Locus 1 (associated with a single trait, CD62L“ dendritic cells, not measured in our panel), and Locus 9 (associated with a single trait: 

CD4 T cell frequency, P2:4186 in our list). The trait ID is fully described in Table S2. 
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subsets, and CD1 61 expression on CD4 T cells) were associated 
with variants on chromosome 1q23 in a genetic region contain- 
ing the important immune-regulating genes FCGR2A, FCGR2B, 
and FCRLA (Table 1). These associations were independently 
verified in the replication cohort, and the combined discovery 
and replication set p values of the 6 SPELs ranged from 2.8 x 
10“^^ to 1.6 X 10“^'^ (Table 1 and Figure IB). Table S3 illustrates 
other examples of genetic control of cell-surface expression, 
including the expression of GDI 1c, GDI 23, and GD274 on 
myeloid subsets. 

Overall, 241 SNP variants with a minor allele frequency above 
5% were significantly associated with various SPELs (Table S3); 
of these, 35 SNPs were pleiotropically associated with multiple 
SPELs. 

Genetic control of SPEL may simply be due to promoter/ 
enhancer element variants or more complex regulation of tran- 
scription, translation, or protein localization. In contrast, genetic 
control of GSF may reveal homeostatic mechanisms regulating 
cell subset representation in the blood. Genome-wide significant 
associations were identified with 13 different GSFs (Figure 1G 
and Table 1). Nearly all were verified in the replication cohort 
(Table 1), and some reached a p value of 10“"^^. 

Suggestive associations, which did not meet the conservative 
significance threshold of 3.3 x 10“^°, were also identified for 
numerous SPELs and GSFs (Tables S3 and S4). The associa- 
tions that were independently replicated (replication p < 0.05), 
as well as meta-analyzed variants reaching p < 5 x 10“®, are 
reported in Table S6. 

Genetic Control of Treg Cells 

One of the most heritable traits identified from our staining 
panels was the frequency of CDOO"^ cells within the CD4 
compartment (Figure 2), as previously reported (Orru et al., 
2013). GD39'^GD4 T, as well as GD73'^ CD4 T cells, have been 
identified functionally as T regulatory (Treg) cells (Borsellino 
et al., 2007), a key subset in the modulation of immune re- 
sponses (Antonioli et al., 2013). 

The heritability of GD39^CD4 T frequency was 89% (95% Gl: 
66%-93%) (Figure 2A). GWAS analysis revealed a single locus 
on chromosome 1 0 that was highly associated with the trait (Fig- 
ure 2B); this locus maps to the GD39 gene itself. Quantification of 
the expression of GD39 on a per-cell basis (i.e., SPEL) revealed 
that the basis for this association was an “on/off” control of the 
expression of the GD39 molecule on the cells, rather than a 
homeostatic regulation of the circulating levels of these cells 
(i.e., GSF). Specifically, individuals who are homozygous for 
rs7096317A express the highest amount of this protein on the 
cell surface; heterozygotes expressed half as much; and 
rs7096317G homozygotes expressed virtually none (Figures 
2G and 2D). Although the A/G heterozygotes have a significantly 
decreased GD39 SPEL, the cells express enough so as to remain 
GD39'^. Thus, in the analysis of the GD39 GSF by genotype, only 
the G/G homozygotes have a reduced frequency of this popula- 
tion (Figure 2E). This illustrates the power of the SPEL analysis to 
de-convolute potential mechanisms of genetic control that are 
missed by simple analysis of GSF. 

Similarly, the frequency of GD4 T cells that are GD25‘^GD127“ 
GD45RO'^ but do not express GD39 was also strongly associ- 



ated with this same locus, showing the opposite association 
(Figure 2F). In other words, genetic control is not over the fre- 
quency of Treg (GD4'^GD25^GD127“GD45R0^) but over the 
quantitative expression of GD39 (the cell phenotype). Notably, 
this genetic control also extends to lymphocytes that are not 
Treg: a similar genotypic association (Figure S6) was found for 
the relatively rare GD8^ and GD4“GD8“ T cells expressing 
GD39 (Figures S7A and S7B). Finally, GD73 is an ectonuclease 
similar to GD39, and its expression has also been associated 
with Treg cells (Antonioli et al., 2013). The expression of GD73 
was also found to be genetically controlled (Figure S7G) and 
associated with a single locus on chromosome 6 mapping to 
the GD73 gene itself. 

Thus, the main genetic control of GD39'" Treg appears to orig- 
inate from a transcriptional or post-transcriptional regulation 
leading to the presence or absence of this protein on the cell sur- 
face of Treg cells; for those Treg defined on a basis independent 
of GD39, we found no evidence of genetic control over their rep- 
resentation in blood. 

Genetic Influences on Leukocyte Differentiation 

In virtually every leukocyte population, we found examples in 
which the frequency of certain differentiation stages was herita- 
ble (Figure 3). In some cases, despite a very high heritability, we 
were unable to identify genetic variants that correlated with the 
trait. For example, the frequency of a GD4 transitional memory 
(Ttm) phenotype (GD28'^GD127“), which comprises 15% to 
20% of GD4 T cells, was very strongly heritable (Figure 3A) but 
did not correlate with any SNP genotypes. We found similar ex- 
amples of strong heritability without genetic associations for 
other T cell stages, including recent thymic emigrants (RTE) 
and central memory (Jcm)- This was not unexpected — our study 
was only powered to find large effect sizes of gene variants, 
which is unusual for most traditional disease GWASs that need 
thousands of subjects per association. This suggests that 
possibly multiple genes of modest influence act on these pheno- 
types. Despite the lack of defined genetic association, the obser- 
vation of strong heritability indicates that these cell types play an 
important and unique role in immunity, such that their numbers 
are strongly regulated. 

For a number of leukocyte lineages, we were able to identify 
genetic associations with differentiation stages and illustrate 
four examples. (1) Within B cells, the proportion that is immature 
(GDIO"^) is associated with the genotype of the membrane 
metallo-endopeptidase (MME) gene (Figure 3B). (2) The propor- 
tion of Th22 GD4 T cells (GXGR3“GGR4-^GGR6-^GGR1 0^ is 
strongly associated with a single locus on chromosome 1 6, map- 
ping to the SPG7 gene, which codes for paraplegin, an important 
protein in mitochondrial function (Figure 3G). (3) Within natural 
killer (NK) cells, the proportion of cells that express GD335 but 
not GD314 (Figure 3D) maps to the KLRC4 gene (Figure 3E). 
This association is much more profound for NK cells that are in 
an early differentiation stage (GD56'"GD16“) and becomes less 
strong as the cells mature (Figure 3F). This indicates that a mech- 
anism evinced in a differentiation stage-specific manner. (4) The 
proportion of T cells that are “stem cell memory” cells (Tscm; the 
earliest memory stage) is heritable (Figure 3G) and associated 
with a genetic locus containing FAS (GD95) (Figure 3H). This 
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Figure 2. Genetic Associations with Treg Phenotype Cells 

(A) The correlation of the fraction of CD4 T cells that are CDSO"^ in dizygotic twins (top) and monozygotic twins (bottom). The linear correlation, r, is shown for each 
comparison. 

(B) Locus plot showing significant effect of individual SNPs on CD39 expression on CD4 T cells. 

(C) Shown are the expression profiles of CD39 and CD25 for the subset of CD4 T cells that are CD45RO'^CD1 27“ for two pairs of dizygotic twins discordant for the 
rs709631 7 allele (in the CD39 gene locus). Within each graphic is shown the fraction of cells in the upper two quadrants and the surface protein expression level 
(SPEL) of CD39 for the cells in the upper right quadrant, as well as the genotype of each individual. 



(legend continued on next page) 



392 Cell 161, 387-403, April 9, 2015 ©2015 Elsevier Inc. 







Cell 



association was much stronger for CD4 than for CDS T cells. 
Tscm are precursor cells that have tremendous proliferative ca- 
pacity and can regenerate all other memory T cell populations 
(Gattinoni et al., 2011; Lugli et al., 2013). Interestingly, this 
same locus also has a significant association with the fraction 
of T cells that are CDS (Figure 31), demonstrating multiple (pleio- 
tropic) effects of the FAS gene on T cell differentiation. 

Pleiotropic Impact of the FcRG2 Locus 

The locus with the widest range of impacts on leukocyte subset 
phenotype and frequency was on chromosome 1, a region 
including the FcRG2 gene. This locus is well known for its 
association with a variety of autoimmune and inflammatory dis- 
eases, including systemic lupus erythematosus (SLE), Kawasa- 
ki’s disease, inflammatory bowel disease, Crohn’s disease, 
type 1 diabetes, and HIV disease progression. Despite the genes 
in this locus being primarily expressed on myeloid and B 
lymphoid cells, many of these diseases are traditionally associ- 
ated with T cell dysregulation. 

The strongest association (e.g., SNP rs1801274) we identified 
for this locus was with the expression of CD32 (FcRG2a and/or 
FcRG2b: these are indistinguishable by the monoclonal antibody 
used in our panel) on the surface of inflammatory myeloid den- 
dritic cells (imDC; Figures 4A and 4B). The heritability of this trait 
was extremely high at 96% (Cl: 81 %-97%). 

The genetic control of the expression of CD32 on imDCs was 
not seen in all cell populations. For example, B cells showed no 
control (Figure 4C), whereas the expression of CD32 on imDCs is 
associated with the number of rsl 801 274 “T” alleles (Figures 4C 
and 4D). 

The rsl 801 274 genotype has been strongly associated with 
susceptibility to SLE, as well as another SNP in the same locus, 
rsl 0800309. This latter SNP has also been associated with ulcer- 
ative colitis. The frequency of CD32'^ iMDCs is strongly affected 
by the genotypes at both of these loci (Figures 4D and 4E); how- 
ever, the distribution of expression for either locus is not uniform: 
high, intermediate, and low expressors can be found within all 
genotypes with differing frequencies. However, when the two 
genotypes are taken together as a diplotype, a powerful and 
replicated association becomes evident for CD32'^ imDCs (Fig- 
ures 4E and S8C). The impact of this diplotype on CD32 expres- 
sion extends to other myeloid subsets (Figures 4F and S8A). 
Statistical significance of the association is greatest for mono- 
cytes, although the dynamic range in the expression levels 
is not as wide as it is for imDCs. Other subsets, such as the pro- 
fessional antigen presenting mDC, show a muted control of 
expression; GD1 1 c^CDI 23"^ DC, like B cells, show no differential 
regulation of CD32 expression at all. 

Given the profound impact of these genotypes on particular 
subsets, it raises the possibility that part of the increased sus- 
ceptibility to associated autoimmune diseases may be a conse- 
quence of the altered function of cells like imDCs by virtue of a 
differential expression of the activating (CD32a) or repressing 



(CD32b) proteins that we identify here. This is perhaps driven 
by a SNP in the promoter/enhancer areas in high linkage disequi- 
librium to the commonly studied coding SNP rsl 801 274. 

We also found a remarkable range of effects of the FcRG2 lo- 
cus on a variety of lymphocyte subsets (Figure 5). For example, 
the proportion of early NK cells that are CD2'^CD1 58a^CD1 58b'^ 
is strongly associated with SNP rs365264 (Figure 5A), located 
between CD32a and GDI 6 (Figure 4B). The rsl 801 274 coding 
SNP in the locus was associated with phenotypes on both B cells 
and T cells, including the fraction of memory IgG^ (Figure 5B) or 
IgA'^ (Figure S8B) B cells that express CD27, as well as the CD27 
expression level on a per cell basis. Interestingly, in this case, the 
higher surface expression levels of CD27 (SPEL) are associated 
with lower frequencies of cells that express CD27 (CSF). Thus, in 
contrast to the example of CD39^ Treg (Figure 2), differential 
regulation of CD27 protein expression does not account for dif- 
ferential frequency of these cell subsets. 

T Cells, FcRG2, and Autoimmune Disease 

Similar to the case for IgG^ B cells, CD8 T cells also exhibited 
higher CD27 expression in association with rs1801274T allele 
(Figure 5C); this was also true for other T cell lineages (Figure S8). 
Furthermore, a population of CD4 T cells that express GDI 61 is 
also strongly associated with this same genotype (Figure 5D). 
Importantly, GDI 61"^ T cells are either Thi 7 or mucosal-associ- 
ated innate T (MAIT) cells, important for maintenance of mucosal 
integrity. Thus, we define an impact of specific gene variants on 
important T cell phenotypes closely related to their activation po- 
tential, which may underlie the associations with T-cell-based 
autoimmune diseases. 

Finally, Starbeck-Miller et al. (2014) recently demonstrated 
that CD8 T cells can express CD32b and that this expression 
was functionally important in modulating cytolytic T cell re- 
sponses. Here, we demonstrate that expression of CD32 on 
CD8 T cells is low and variable between individuals (Figure 5E). 
Notably, this expression shows a very strong association with 
the rsl 801 274:rs1 0800309 diplotype of the FCRG2 gene locus 
(Figure 5F). This suggests that the regulation of surface expres- 
sion of this negative regulatory molecule on CD8 T cells (Star- 
beck-Miller et al., 2014) has a common expression mechanism 
to that in myeloid populations. Gene variants increasing suscep- 
tibility to SLE also associate with lower levels of this negative 
regulatory protein on CD8 T cells and imDC. Together, these 
data provide a possible direct link between the SNPs highly 
associated with autoimmune diseases and T cell phenotypes 
that might account for the pathogenesis. 

Overlap with Disease Associations 

The Catalog of Published Genome-wide Association Studies 
(http://www.genome.gov/gwastudies/) and ImmunoBase (http:// 
www.immunobase.org/) were used to evaluate the overlap be- 
tween genetic variants associated with CSFs and SPELs in our 
study and those reported to be suggestively or statistically 



(D) The CD39 SPEL of CD39 positive ceiis is graphed by the genotype of rs7096317; the dotted iine indicates the threshoid of positivity above 
which a ceii was considered CD39^. in the C/C genotype, reiativeiy few ceiis are above this threshoid and the median fluorescence intensity values are not 
robust. 

(E and F) The fraction of CD4 T cells of the designated phenotype is graphed by the rs709631 7 genotype. Bars indicate interquartile range. 
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Figure 3. Genetic Associations with Lymphocyte Differentiation 

(A) The proportion of CD4 T cells that are “transitional memory” (CD28^CD127“) is shown for DZ and MZ twins. 

(B) The proportion of B cells that are immature is shown for twins (left) and is strongly associated with the genotype of rsl 051 3469 (MME gene) (right). 

(C) The proportion of CD4 T cells that are Th22 (CXCR3“CCR4'^CCR6^CCR10^) is associated with the genotype of rs2019604 (SPG7 gene). 

(D) A frequency of four phenotypes within NK cells (designated as “A”...“D” based on the expression of CD314 (KLRC4) and CD335 is shown for “early” 
(CD56^CD16^) differentiated NK cells. 

(E) The proportion of early NK cells that are CD31 4“CD335'^ (population “A”) is shown for DZ and MZ twins (left). (Right) The genotypes of rsl 841 957 (near the 
KLRC4/CD314 locus) strongly associate with the frequency of CD314“CD335'^ cells among early NK. 

(F) The associations of rsl 841 957 with all four phenotypes within differentiation stages of NK cells are shown by p value. 
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significant in candidate SNP, candidate gene, or genome-wide 
association studies of complex and infectious diseases. 

SNPs that were highly correlated variants (rSo.8) with our sig- 
nificant immune traits were also interrogated for overlap with re- 
ported disease associations using the appropriate thresholds for 
the number of tests. A number of gene variants significantly and 
suggestively (p < 10“^) correlated with CSFs and SPELs have 
been reported in associations of complex and infectious dis- 
eases, as shown in Table 2. The different gene variants of 
FCGR2A, associated with a range of myeloid and T cell pheno- 
types in our data, were reported to be associated with increased 
risk of a number of diseases, including inflammatory bowel dis- 
ease, ulcerative colitis, SLE, Kawasaki disease, ankylosing- 
spondylitis, and HIV progression (Table 2). An additional variant 
ofFCGR2A, rs1 0494359 (associated with P7:1 00 [CD64“CD274“ 
imDCs]), is closely correlated (in linkage disequilibrium) with 
rs1 0494360 (r^ = 0.941) and has been associated with rheuma- 
toid arthritis. Juvenile idiopathic arthritis and chronic lympho- 
cytic leukemia susceptibility loci in the ACTA2/FAS region of 
chromosome 1 0q23.31 were also associated with the frequency 
ofP1:6601 (CD4 Tscm) (P = 4.1 x 10“^^; Table 2). The Behget’s 
disease susceptibility variant in the killer cell lectin-like receptors 
gene region corresponded with the frequencies of P4:3551, 
P4:4832, and P4:5538 (all three are CD31 4“ subsets of NK cells). 
The correlations of variants association with tuberculosis, ma- 
laria, leprosy, HIV, and hepatitis B and C our immune traits are 
presented in Table 2. 

DISCUSSION 

Understanding the fundamental principles of how the immune 
system protects the host from infection yet also contributes to 
autoimmunity and other disease pathogenesis is essential for 
the development of novel diagnostics and medicines. There re- 
mains a major gap in our understanding of genetic determinants 
of a normal human immune system and its main coordinates 
such as the frequency of immune cells and expression of rele- 
vant proteins. Using 669 twins and the richest immunophenotyp- 
ing performed to date, we investigated the genetic architecture 
of immune traits. We describe multiple independent genetic var- 
iants at several genetic loci explaining a substantial proportion 
(up to 96%) of the genetic variation. We identify both pleiotropic 
genetic loci that control multiple immune traits and single im- 
mune traits under genetic influence by multiple loci. For certain 
canonical immune traits, genetic control is exerted at the level 
of immune cell-surface protein expression (i.e., a consequence 
of promoter/enhancer or signaling mechanisms) rather than at 
the level of cell subset frequency (i.e., homeostasis or differenti- 
ation mechanisms). We further describe multiple genetic associ- 
ations with common canonical immune traits related to leuko- 
cyte lineage and differentiation of major immune cell subsets 
such as B cells, T cells, and natural killer cells. Finally, we identify 
genetic elements associated with both immune traits and auto- 



immune and infectious diseases. Providing the heritabilities of 
thousands of cell subtypes plus a basis to uncover the genetic 
architecture of the numerous gene-immune associations estab- 
lishes this data set as an essential bioresource for researchers. 
The remarkably strong associations we find for genetic traits 
linked to disease illustrate the power of our approach by using 
twins and optimized high-quality immune phenotyping. 

Some limitations of our study should be noted: the cohort used 
is all female, it is (for GWAS) a relatively limited sample size, and it 
is relatively homogeneous in terms of environmental exposure. 
The low numbers of genetic associations on chromosome 6 
(the major histocompatibility region) is possibly explained by 
the considerable complexity and polymorphism in this gene re- 
gion, which would require larger sample sizes to obtain statisti- 
cally significant genetic associations. With regard to immunolog- 
ical traits, it should be noted that our discovery cohort ranges in 
age from 41 to 77. It is possible that analysis of a younger cohort, 
for which less environmental pressure on the immune system 
has occurred, would reveal stronger associations; on the other 
hand, it is likely that the greatest power to detect immune corre- 
lates related to disease will come from measurements at a time 
most proximal to the typical onset. Nevertheless, our success in 
identifying a large number of genetic variants with genome-wide 
significance validates our approach of focusing on well-defined 
and curated immune phenotypes. It should be noted that 
the 297 SNPs we report (Table S5) are those that attained 
genome-wide significance (with a conservative correction for 
multiple comparisons) in both the discovery and replication co- 
horts. Many more associations are evident in the data set (e.g.. 
Tables S2 and S3), which can serve to formulate new testable hy- 
potheses and genetic studies. 

An example of the power of the resource was in distinguishing 
two important mechanisms that lead to differential representa- 
tion of immune cell phenotypes (e.g., CD39'^ CD4 T cells). The 
first is homeostatic — i.e., mechanisms that control the recircula- 
tion, proliferation, and elimination of a certain cell type in the 
blood — a cellular mechanism expressed at the whole-body level. 
Such mechanisms, although unidentified, are known to exist for 
regulating major subset numbers (such as CD4 T cell numbers). 
A second, completely independent mechanism is the molecular 
regulation of the protein expression at the cell level itself (e.g., 
promoter/enhancer variants). Thus, even in the presence of 
intact homeostatic mechanisms regulating that cell type, a 
reduction in the number of cells expressing a protein (perhaps 
part of the defining phenotype) may be due to a specific pro- 
moter variant that simply abrogates the expression of the 
gene. Here, we show examples of both mechanisms. 

An example of a major immune trait under genetic control is 
the phenotype, but not frequency, of regulatory T (Treg) cells. 
Treg cells are essential for maintenance of immune homeosta- 
sis, and their dysregulation might lead to autoimmunity (Sakagu- 
chi et al., 2010). A pleiotropic locus containing the ectonucleo- 
side triphosphate diphosphohydrolase 1 (ENTPD1 and CD39) 



(G) The proportion of CD4 T cells that are “stem cell memory” CTscm: CD45RA^CD95^CD27'^CD28'^CD127^CD57“) is shown for DZ and MZ twins. 

(H and I) The genotypes of rs7069750 (FAS gene) are associated with the proportion of CD4 and CDS T cells that are Tscm. as well as the proportion of all T cells 
that are CDS. 

Bars indicate interquartile ranges. 
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Figure 4. Genetic Associations of the FcR Locus with Myeloid Immunophenotypes 

(A) The correlation of the fraction of imDCs that are CD32-^ for dizygotic twins (top) and monozygotic twins (bottom). The linear correlation, r, is shown for each 
comparison. 



(legend continued on next page) 
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gene controls several phenotypic features of CDSQ"^ Treg- 
Although we confirm an apparent association of this gene locus 
with the frequency of circulating CD39^ Treg (Orru et al., 2013), 
we show here that this is a consequence solely of altered pheno- 
type. Quantitative analysis of CD39 protein expression on the 
cell surface demonstrates that the genetics of control of CD39^ 
Treg is exerted at the level of surface protein expression, rather 
than cell frequency (homeostasis). This establishes a paradigm 
for future studies of immune traits and points to the necessity 
of including both cell frequency and cell protein expression ana- 
lyses in immunogenetic studies of the human immune system. 
We also describe the association of a genetic locus containing 
the ecto-5'-nucleotidase (NT5E and CD73) gene with a popula- 
tion of CD73‘^ T REG- From a functional perspective, it is of interest 
that both CD73 and CD39 are ectonucleotidases involved in the 
generation of immunosuppressive adenosine that alters T cell 
and NK cell activities (Deaglio et al., 2007). Thus, it appears 
that the CD39 and CD73 adenosine immunosuppressive 
pathway has been under evolutionary selection and might there- 
fore be a critical determinant of functional Treg activity (Bastid 
et al., 2013) and establishment of tissue homeostasis. Impor- 
tantly, we conclude that there is no genetic control of the fre- 
quency of Treg (i-C-> CD127“CD25'^ memory CD4 cells), but 
rather, control is evinced at the level of their specific phenotype 
and presumably function. Furthermore, the lack of a known dis- 
ease association with this locus calls into question the impor- 
tance of CD39 expression for Treg function. 

We discovered several genetic associations with immune 
traits relevant to lymphocyte lineage and differentiation. A ge- 
netic locus containing the cell-surface death receptor FAS 
(CD95) was associated with the frequency of circulating T stem 
cell memory cells (Tscm)- Tscm are a recently described infre- 
quent and functionally important lymphocyte subset (Gattinoni 
et al., 2011); these cells have a largely naive T cell phenotype 
but are able to self-renew while displaying functional attributes 
of memory cells. Genetic control at the level of CD95 suggests 
a potential role of CD95 in the control of T stem cell homeostasis 
for not only differentiation stages such as Tscm but total CDS as 
well (Figure 3). It further provides a possible link between human 
autoimmune syndromes based on genetic CD95 deficiency 
(Strasser et al., 2009) and Tscm- 

A genetic locus within a cluster of genes referred to as the 
“NK complex” containing NKG2D (CD314 and KLRK1) was 
associated with the frequency of a distinct population of 
CD314"CD335-^ “early” (CD56^CD16“) NK cells. The NKG2D 
gene encodes for a C-type lectin protein preferentially expressed 
in NK cells. It binds to a diverse family of stress-induced ligands 



that include MHC class I chain-related A and B proteins (MICA 
and MICB), essential for the activation of T cells and NK cells 
(Raulet et al., 2013). These data establish an unexpected link be- 
tween the genetic control of the frequency of a specific subset of 
NK cells and a gene locus containing major genes with functional 
relevance to NK cell activity and their activation by stressed 
normal tissue cells or tumor cells. 

One locus in chromosome 1 q23 containing FCGR2A, FCGR2B, 
and FCRLA was associated with multiple immune traits. FcGR 
genes encode immunoglobulin Fc surface receptors found on 
macrophages, dendritic cells, and neutrophils, as well as B and 
NK cells, and are involved in the regulation of B cell antibody pro- 
duction and phagocytosis of immune complexes. The main asso- 
ciated immune traits were the frequencies of CD32‘^ inflammatory 
dendritic cells and monocytes, as well as several T cell pheno- 
types. Genetic variation in the FcGR gene locus is associated 
with an increase in susceptibility to several autoimmune and infec- 
tious disease, including SLE, ankylosing spondylitis, HIV progres- 
sion, and several other syndromes. Given the profound impact of 
this gene locus on particular immune cell subsets, altered function 
of CD32'^ dendritic cells could be key to increased susceptibility to 
these autoimmune and infectious diseases. 

Indeed, our demonstration of an association between the 
FCGR2A SNP and T cell phenotypes and/or inflammatory 
myeloid cells (e.g., imDCs) provides a potential link between 
this locus and autoimmune diseases with T cell etiology. This 
coding SNP results in variants of the Fc receptor that have 
different avidities for immunoglobulin and C-reactive protein; 
consequently, much current research is aimed at understanding 
the possible functional role of these alleles in autoimmunity. 
However, our data suggest a different possibility with a more 
proximal mechanistic link: that association is with a promoter/ 
enhancer SNP (in strong linkage disequilibrium with the coding 
FCGR2A SNP) that modulates expression of the negative regu- 
latory CD32b molecule on imDC, monocytes, and/or T cells. 

In addition to the wide range of diseases associated with the 
FcRG locus, further examples include a SNP within a genetic 
locus associated with Behget’s disease (Kirino et al., 2013) that 
is in tight linkage disequilibrium with a SNP controlling the fre- 
quency of CD314“CD335^ early NK cells. We also report that a 
genetic locus containing FAS and associated with Juvenile idio- 
pathic arthritis (Hinks et al., 2013) is also associated with func- 
tionally important T stem memory cells. 

These findings illustrate a key value of our database and 
approach: the identification of candidate immune traits associ- 
ated with genetic loci of relevance to autoimmunity and infection. 
In summary, using one of the most comprehensive immune 



(B) Organization of the FcR iocus of chromosome 1 showing the position of immunoiogicaiiy reievant genes. The positions of three SNPs are highiighted in coior; 
rs1 801 274 and rsl 0800309 are the two that are most cioseiy associated to susceptibiiity to SLE. SNPs shown in green were in compiete iinkage disequiiibrium 
within the sampies anaiyzed in our cohort. 

(C) Sampie expression profiies of CD32 on imDCs (top) and B ceiis (bottom). Shown are the fraction of ceiis that are CD32^ (in pink) and, for the B ceiis, the CD32 
SPEL (in orange). Two pairs of dizygotic twins discordant for the genotype at rsl 801 274 are shown. 

(D) The distribution of expression of CD32'^ imDCs is shown by genotype at rsl 801 274. 

(E) The expression of CD32 on the imDCs is not significantiy associated with the genotype at rsl 0800309 by standard ANOVA (p = ns); however, the distributions 
are cieariy different by genotype. The combination of the genotype at rsl 0800309 and rsl 801 274 provides a dramatic distinction for the expression of CD32 on 
imDC. 

(F) The expression profiie of CD32 on seven different myeioid popuiations is shown, broken down by the combined genotype of the two SNPs. Bars indicate 
interquartiie range. 
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phenotyping efforts to date, we identified numerous genetic loci 
controlling key parameters of a normal human immune system. 
This comprehensive human immune phenotyping bioresource 
will allow the identification of critical immune phenotypes 
associated with common autoimmune and infectious diseases, 
ultimately leading to accelerated discovery of mechanisms of 
disease and response to therapy. 

EXPERIMENTAL PROCEDURES 
Samples 

This study was approved by the N I AID (N I H) IRBand London-Westminster NHS 
Research Ethics Committee; all participants provided informed consent. The 
discovery stage comprised 497 female participants from the UK Adult Twin 
Register, TwinsUK, with full genotyping data on 460 subjects. The TwinsUK 
cohort is described in detail in (Moayyeri et al., 201 2). Briefly, TwinsUK is a large 
cohort of twins historically developed to study the heritability and genetics of 
diseases with a higher prevalence among women. The study population is 
not enriched for any particular disease or trait and is representative of the British 
general population of Caucasian ethnicity. Selected twins were all female with 
an age range of 41-77 (mean 61.4), and by self-report, 100% Caucasian with 
most being UK ancestry. From the subsequent genotype data, we excluded 
a few individuals showing evidence of non-European ancestry as assessed 
by principal component analysis comparison with HapMapS. 

The replication stage included a further unrelated 172 TwinsUK participants 
with whole-genome genotyping data on 169. The samples for the discovery 
samples were selected to match the characteristics (age and gender) of the 
discovery data set. For this reason, the replication cohort included only Cauca- 
sian women with an age range of 32-83 (mean 58.2). All subjects were nomi- 
nally healthy at the time of sample collection. 

A total of 746 PBMC vials were analyzed: PBMC from the 669 twin speci- 
mens (plus 4 replicates) and 30 healthy controls from the US (two vials of 
PBMC from blood drawn six months apart were analyzed for 29 subjects, a 
replicate vial of PBMC from one of the two blood draws for 14 (of the 29), 
and one vial only for 1 subject). The samples in each stage were ordered 
such that twin or longitudinal control samples were analyzed in the same 
experimental run (each comprising 15-30 vials), whereas replicate control 
samples were analyzed in different experimental runs. Staining and data ana- 
lyses were otherwise performed blinded to identity. 

Immunophenotyping 

See Extended Experimental Procedures for cell processing details. 

Flow Cytometry and Data Analysis 

Cells were analyzed in 96-well plate format on an 18-color LSR (BD Biosci- 
ences) using an HTS unit. Each run on the flow cytometer was accompanied 
by a set of compensation controls of antibody-stained IgG kappa beads (BD 
Biosciences). Data were evaluated on FlowJo software v9.7 (FlowJo, LLC). 
Post-processing of data and visualizations were done with JMP vIO (SAS) 
and SPICE v5.3 (NIAID; Roederer et al., 2011). 

Gating 

A graphic depicting the fluorescence distribution of all samples in the discov- 
ery cohort is shown in Figure SI . Figures S2 and S3 illustrate the gating hier- 
archy for a single sample. The Extended Experimental Procedures describe 
the generation of the --80,000 gates analyzed. 



For the discovery cohort, there were 20 experimental runs for 543 samples 
(501 twin specimens from 497 subjects and 42 US control specimens from 14 
subjects). Within each run, uniform scatter gating was used. Each sample was 
gated on time (to eliminate spurious events from beginning or end of sample 
run); this gate could vary by sample. With two major exceptions, all samples 
received the same fluorescence gating. After the third run, we chose to replace 
the CD4 reagent in panels 1 , 2, and 5 due to poor performance; this necessi- 
tated a different CD4 gating for those samples. Similarly, the reagents in the 
“dump” channel of panel 7 were modified after 6 runs. 

For the replication cohort, there were 8 experimental runs for 203 samples 
(172 twin specimens from 172 subjects and 31 US control specimens from 
16 subjects). All analysis procedures were identical to the discovery cohort. 
Minor modifications to the panels were necessitated by unavailability of the 
same lots of reagents, but these did not impact enumeration of subsets. Spec- 
imen processing for the replication cohort was initiated after final analysis of 
the discovery cohort. 

Genotyping 

Genotyping was conducted with a combination of lllumina arrays 
(HumanHap300 and HumanHap610Q) (Richards et al., 2008; Soranzo 
et al., 2009). The llluminus calling algorithm (Teo et al., 2007) was used to 
assign genotypes. No calls were assigned if an individual’s most likely geno- 
type was called with less than a posterior probability threshold of 0.95. Vali- 
dation of pooling was achieved via a visual inspection of 100 random, shared 
SNPs for overt batch effects. Finally, intensity cluster plots of significant 
SNPs were visually inspected for over-dispersion, biased no calling, and/or 
erroneous genotype assignment. SNPs exhibiting any of these characteris- 
tics were discarded. Stringent QC measures were performed on the geno- 
types prior to data analysis. The sample exclusion criteria were: (1) sample 
call rate < 98%, (2) heterozygosity across all SNPs > 2 SD from the sample 
mean; (3) evidence of non-European ancestry as assessed by principle 
component analysis comparison with HapMap3 populations; (4) observed 
pairwise identity by descent (IBD) probabilities suggestive of sample identity 
errors; (5) misclassified monozygotic and dizygotic twins were corrected 
based on IBD probabilities. The exclusion criteria for SNPs were: (1) 
Hardy-Weinberg equilibrium (HWE) p value < 10“®, assessed in a set of un- 
related samples; (2) minor allele frequency (MAF) < 1%, assessed in a set of 
unrelated samples; (3) SNP call rate < 97% (SNPs with MAF > 5%) or < 99% 
(for 1% < MAF < 5%). Alleles of both data sets from the genotyping 
arrays were aligned to HapMap2 or HapMap3 forward strand alleles. 
Imputation was performed using the IMPUTE v2 software package (Howie 
et al., 2009). After imputation of the 2,986,407 SNPs available for analysis, 
1,419,558 SNPs passed further QC (call rate > 95%, MAF > 0.05, 
HWE > lO""^) and were used for analysis. 

Statistical Analyses 

Selection of Subsets for Heritability and GWAS Analysis 

Full details are in the Extended Experimental Procedures. In brief, we elimi- 
nated CSFs with frequencies below 0.1% or above 99% (Figure S4A). From 
these, we selected -^200 for in-depth analysis based on heritability or descrip- 
tion in the literature (“canonical”). 

Genome-wide Association Analysis 

Because of relatedness in the TwinsUK cohort, we utilized the GenABEL soft- 
ware package (Aulchenko et al., 2007), which is designed for GWAS analysis of 
family-based data by incorporating pairwise kinship matrix calculated using 



Figure 5. Genetic Associations of the FcR Locus with Lymphoid Immunophenotypes 

(A) The genotype of rs365264 (close to GDI 6a on chromosome 1) is strongly associated CD56^CD16“ (“early”) NK cells that are CD2^CD158a^CD158b^. 

(B) The genotype of rs1801274 is associated with the frequency of memory IgG^ B cells that are CD27^CD38“CD20“ as well as the fraction of CD27^ cells. Note 
that, for this case, a lower frequency of the subset (left) is associated with higher protein expression (right). 

(C) Similarly, the genotype of rs1801274 is strongly associated with the cell-surface expression of CD27 on CD8 T cells. 

(D) The genotype of rs1801274 is strongly associated with the cell-surface expression of GDI 61 on GD4 T cells, as well as GD4 T cells that are 
GD16rPDrGGR4^ 

(E) GD8 T cells express low levels of GD32 depending on genotype as shown by flow cytometry. 

(F) The fraction of GD8 T cells that express GD32 is strongly associated with the rs10800309:rs1801274 diplotype (see Figure 4). 

Bars indicate interquartile ranges. 
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Table 2. Overlapping Associations with Complex Diseases 


Immune Trait Association 










Disease Association 








Gene (Chr) 


Marker 


Trait ID 


Trait Phenotype 


Beta 


p Value 


Reported SNP 


Disease 


Best P 


Reference 


FCGR2A (1) 


rs1801274 


P7:110 


iMDC: %CD32-" 


0.2 


1.6E-23 


rsl 801 274 (A) 


IBD 


2.1E-38 


(Jostins et al., 2012) 






MFI:189 


CD27 on IgA^ B 


0.13 


6.5E-11 




Kawasaki disease 


7.4E-11 


(Khor et al., 2011) 






MFI:212 


CD27 on IgG^ B 


0.14 


5.4E-12 




ulcerative colitis 


2.2E-20 


(Anderson et al., 2011) 






MFI:231 


GDI 61 on CD4T 


0.13 


2.6E-11 




ankylosing-spondylitis 


1.4E-09 


(Cortes et al., 2013) 






MFI:504 


CD27 on CD4 T 


0.15 


5.4E-14 




SLE 


6.8E-07 


(Harley et al., 2008) 






MFI:552 


CD27 on CD8 T 


0.19 


1.3E-21 




HIV progression 


1.0E-04* 


(Forthal et al., 2007) 






P7:224 


CDIc^mDC: %CD32 


0.12 


4.4E-09 




lymphoma 


0.006* 


(Wang et al., 2006) 
















malaria 


0.013* 


(Sinha et al., 2008) 


FCGR2A (1) 


rsl 0494359 


P7:110 


iMDC: %CD32^ 


0.34 


2.5E-29 


rsl 0494360 (r^ = 0. 94) 


rheumatoid arthritis 


9.3E-05* 


(Eyreet al., 2012) 


ZNF804A (2) 


rs6755404 


P2:3367 


Effector CD4: CD127-PD1” 


0.09 


3.8E-04 


rs6755404 (A) 


malaria 


1.2E-06 


(Band et al., 2013) 


MIR216B (2) 


rs6751715 


P3:5372 


CD8 T: %CXCR3"R4^R6^R10- 


-0.09 


1.5E-05 


rs6751715 


HIV 


1.1E-06 


(Fellay et al., 2009) 






P3:5661 


CD8 T: %CD16r 


-0.07 


5.1E-04 














P6:197 


lgA+ B: %CD27"CD38“ 


0.07 


7.0E-04 














P3:5658 


CD8 T: %CD16rPD1- 


-0.07 


7.8E-04 










LOC100505836 (3) 


rs2593321 


Lin:19 


%NKT 


0.08 


5.5E-04 


rs2593321 


HIV-1 control 


7.7E-06 


(Pelaketal., 2010) 


MICA (6) 


rs4418214 


P6:112 


IgE-^B: %CD20-CD27"CD38^ 


0.12 


7.6E-04 


rs441 821 4(C) 


HIV-1 control 


1.4E-34 


(Pereyra et al., 2010) 


BTNL2 (6) 


rs3817963 


P2:12609 


CD8 T: %CD25XD38M5RO^ 


-0.08 


3.5E-04 


rs3817963 (A) 


hepatitis C liver cirrhosis 


1.3E-08 


(Urabe et al., 2013) 






P2: 10486 


CD8 T: %CD25XD38^ 


-0.08 


4.1E-04 








(Urabeetal., 2013) 


HLA-DQB1 (6) 


rs2856718 


P2: 12609 


CD8 T: %CD25XD38M5RO^ 


-0.08 


5.3E-04 


rs2856718 (A) 


hepatitis B 


4.0E-37 


(Mbarek et al., 2011) 






P2: 10486 


CD8 T: %CD25XD38^ 


-0.07 


7.8E-04 










EHMT2 (6) 


rs652888 


MFI:578 


CD3 on CD8 T 


-0.08 


5.7E-04 


rs652888 


chronic hepatitis B 


7.1E-13 


(Kim et al., 2013) 


MTC03P1 - HLA- 
DQA2 (62) 


rs4273729 


P2: 10486 


CD8 T: %CD25XD38^ 


0.08 


1.4E-04 


rs4273729 


chronic hepatitis C 


1.7E-16 


(Duggal et al., 2013) 






P2: 12609 


CD8 T: %CD25XD38M5RO^ 


0.08 


2.3E-04 










ADGB (6) 


rs2275606 


PI :1 2906 


CD8 T: %TSCM 


0.13 


6.3E-04 


rs2275606 (A) 


leprosy 


3.9E-14 


(Zhang et al.,2011) 


ABO (9) 


rs81 76722 


MFI:1 


CD123 on mDC 


-0.14 


1.4E-04 


rs81 76722 (A) 


malaria 


8.9E-10 


(Band et al., 2013) 


ACTA2,FAS (10) 


rs7069750 


P1:6601 


CD8 T: %TSCM 


-0.14 


4.1E-12 


rs7069750 (C) 


juvenile idiopathic arthritis 


2.9E-08 


(Hinks et al.,2013) 














rs2 147420 (r^ = 1) 


CLL 


3.1E-13 


(Berndt et al., 2013) 


KLRC4-KLRK1 (12) 


rsl 0491 72 


P4:3551 


NK: %CD314"CD158a^ 


-0.13 


8.8E-09 


rs2617170 (r^ = 0.922) 


Behget’s disease 


1.3E-09 


(Kirino et al., 2013) 






P4:4832 


NK: %CD314-CCR7- 


-0.23 


7.6E-22 














P4:5538 


NK: %CD314"CD335^ 


-0.27 


1.4E-30 










MMP16 (8) 


rsl 60441 


P4:3551 


NK: %CD314“CD158a^ 


-0.08 


2.3E-05 


rsl 60441 


tuberculosis 


8.4E-06 


(Thye et al., 2010) 


ARHGAP20 (11) 


rsl 4691 70 


P4:3551 


NK: %CD314"CD158a^ 


-0.08 


5.6E-04 


rsl 4691 70 (A) 


malaria 


8.0E-08 


(Band et al.,2013) 



Association results from the discovery cohort immune trait analyses are reported in the first six columns. The trait ID is fully described in Table S2. The disease-associated variant, pathology, the 
disease-associated SNP’s best-reported p value are indicated in the columns, respectively. The risk alleles presented correlate with increased disease susceptibility and the r^ between 
disease SNP and immune trait, if different, are reported in parentheses after the reported SNP. *Disease association is significant with candidate SNP method/gene approach. 
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genotyping data in the polygenic model to correct relatedness and hidden 
population stratification. The score test implemented in the software was 
used to test the association between a given SNP and the trait. Additional qual- 
ity control was conducted on the association results: minor allele frequency 
(MAP) > 0.1, Hardy-Weinberg equilibrium < 10“®, SNP call rate > 90%. For 
our results, we used a genome-wide significance threshold of p < 3.3 x 
10“^°. This threshold represents a standard genome-wide significance 
threshold (p < 5 x 1 0“®) further adjusted for 1 51 independent tests (the number 
of traits analyzed for GWAS in the discovery cohort). Because there is a high 
intercorrelation among some of the 151 analyzed traits, this is a very conser- 
vative approach and ensures the robustness of our findings. Genome-wide 
validations were also performed on normalized residuals (after corrections 
for age), with GenABEL taking into account family structure in the model. 
The validation p value threshold was set to p < 0.05. 

Results from the genome-wide analyses of all the analyzed traits for both 
the discovery and the replication cohorts were meta-analyzed. Fixed effects 
inverse-variance weighted meta-analyses were conducted using METAL 
(Wilier et al., 2010), with significance evaluated at p < 5 x 10“®. 

Finally, correlation against the FCRG “diplotype” shown in Figures 4 and S8 
was done using JMP; ANOVA p values without correction for family structure 
are reported. 

Identifying Traits Correiated with SNPs 

Once we had identified significant associations between the selected 151 
traits and all SNPs based on standard analyses, we looked for pleiotropic 
effects of those SNPs. Approximately 1,200 SNPs from about 18 unlinked 
loci were correlated against all traits. 

To correct for multiple comparisons, we evaluated the covariance among all 
traits. As shown in Figure S4B, there are less than 1 ,800 traits that show a 
covariance less than 0.7. Thus, Bonferroni correction sets a significance 
threshold of p < 2.8 x 1 0“^. In our analyses, which covered about 20 unlinked 
loci, we used a more conservative threshold, requiring a Wilcoxon rank asso- 
ciation of trait values with SNP genotype to be significant if p < 10“^. 

GWAS Cataiog Look-Ups 

In order to ascertain whether the variants from the associations of the immune 
cell subsets analyses overlapped with significant disease associations, 
the GWAS catalog (http://www.genome.gov/gwastudies/), Immunobase 
(http://immunobase.org/page/Welcome/display), and SNPedia (http://www. 
snpedia.com) repositories were used. Gene regions and variants reaching 
genome-wide significance were searched on these repositories and overlap- 
ping associations or correlations with proxy SNPs or those in strong LD are 
reported in Table 2. 

BioData Repository 

Raw and summary data are available for downloading and analysis. Genotype 
data are available upon request to the authors. See Extended Experimental 
Procedures for detailed downloading instructions. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, eight 
figures, and six tables and can be found with this article online at http://dx.doi. 
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SUMMARY 

Noncoding RNAs (ncRNAs) function with associated 
proteins to effect complex structural and regula- 
tory outcomes. To reveal the composition and dy- 
namics of specific noncoding RNA-protein complexes 
(RNPs) in vivo, we developed comprehensive identifi- 
cation of RNA binding proteins by mass spectrometry 
(ChIRP-MS). ChIRP-MS analysis of four ncRNAs 
captures key protein interactors, including a U1 -spe- 
cific link to the 3' RNA processing machinery. Xist, 
an essential IncRNA for X chromosome inactivation 
(XCI), interacts with 81 proteins from chromatin 
modification, nuclear matrix, and RNA remodeling 
pathways. The Xist RNA-protein particle assembles 
in two steps coupled with the transition from pluripo- 
tency to differentiation. Specific interactors include 
HnrnpK, which participates in Xist-mediated gene 
silencing and histone modifications but not Xist 
localization, and Drosophila Split ends homolog 
Spen, which interacts via the A-repeat domain of 
Xist and is required for gene silencing. Thus, Xist 
IncRNA engages with proteins in a modular and 
developmentally controlled manner to coordinate 
chromatin spreading and silencing. 

INTRODUCTION 

Many long noncoding RNAs (IncRNAs) are recently recognized 
as functional regulators of gene expression (Rinn and Chang, 
2012), but their mechanisms of action are largely unknown. 
RNA binding proteins (RBPs) play key roles in IncRNA-mediated 
gene regulation, and obtaining the full interaction map of proteins 
bound to a IncRNA of interest is critical to our understanding of 
its function. Many tools have been developed to describe 
RNA-protein interaction from a protein-centric view, typically 
by immunoprecipitating a protein and analyzing the associated 
RNAs with a microarray or high-throughput sequencing (re- 
viewed by Riley and Steitz, 2013). In contrast, fewer methods 
are available from the perspective of a particular RNA. This is 
usually achieved by (1) tagging the RNA with affinity aptamers. 



which involves complicated genetic engineering; (2) using in-vi- 
tro-transcribed RNA to retrieve proteins from native cell lysates 
(RNA chromatography), which is prone to the formation of non- 
physiological RNA-protein interactions; and (3) using immobi- 
lized oligonucleotides to capture RNA:protein complex under 
native conditions, which suffers from both post-lysis re-associa- 
tions and unpredictable specificity of target RNA retrieval (re- 
viewed by Chu et al., 2015). The ideal strategy should capture 
in vivo IncRNA-protein interactions, achieve high yield and spec- 
ificity without genetic tagging, and provide comprehensive por- 
traits of IncRNP in diverse biological states. 

Xist is a IncRNA (1 7 kb long in the mouse) required for X chro- 
mosome inactivation (XCI) of one of the two X chromosomes in 
female cells, thus enabling dosage compensation between XX 
females and XY males (Gendrel and Heard, 2011). XCI takes 
place early in embryonic development and is thought to occur 
in multiple steps: counting and choosing the X chromosome to 
silence, spreading of Xist over the target X chromosome, and 
silencing of most of its active genes (Payer and Lee, 2008). The 
latter two steps are believed to be mediated by specific Xist- 
associated protein factors, which remain largely mysterious. 
Xist expression marks the future inactive X chromosome (Xi) 
and is sufficient to recruit silencing chromatin modification com- 
plex such as the Polycomb proteins (Gendrel and Heard, 2011). It 
has been debated whether Xist RNA physically recruits one or 
more silencing factors or whether Xist indirectly promotes tran- 
scriptional silencing via reinforcement of repressive chromatin. 
XCI is also developmentally regulated in several important 
ways. In the mouse, XCI can proceed by random inactivation 
of either paternal or maternal chromosome in somatic cells or 
by always inactivating the paternally derived X in extra-embry- 
onic cells, a process called imprinted XCI (Takagi and Sasaki, 
1975). During random XCI, Xist is not expressed in pluripotent 
embryonic stem cells (ESCs) and is upregulated during differen- 
tiation (Wutz and Jaenisch, 2000). Ectopic Xist RNA coating can 
induce gene silencing in ESCs, although this is reversible during 
an early differentiation time window, becoming irreversible at 
later stages (Wutz and Jaenisch, 2000). Knowledge of the Xist 
IncRNP in these diverse states may provide insights into this 
classic and intricate epigenetic system. 

Here, we introduce comprehensive identification of RBPs by 
mass spectrometry (ChIRP-MS), an optimized method for the 
identification of IncRNA-bound proteome. Applying ChIRP-MS 
to four noncoding RNAs, we found known and validated novel 
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Figure 1. ChIRP-MS Method and Validation 

(A) Outline oftheChIRP-MS workflow. Briefly, RNP 
complexes are crosslinked in vivo by 3% formal- 
dehyde for 30 min and solubilized by sonication. 
Target ncRNA are pulled out by biotinylated anti- 
sense oligos, and associated proteins are eluted 
with free biotin, separated by electrophoresis. 
Each size fraction is subjected to LC/MS-MS 
identification. 

(B) Distribution of input and U1- and U2-enriched 
RNA sizes, as determined by Bioanalyzer (Agilent). 

(C) Proteins retrieved by U1, U2, U3, and control 
probes, analyzed by immunoblotting. Arrow in- 
dicates the U1A close homolog, U2B, cross- 
identified by U1 A antibody. 

(D) Proteins retrieved by U1, U2, non-targeting 
probe control, and Rnase-treated controls, visu- 
alized by silver staining. Major proteins enriched 
are indicated on the left. 
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functional interactors. By performing Xist ChIRP-MS in different 
cell states, lineages, and cell types and with mutant Xist alleles, 
we uncover mechanisms of dynamic and coordinate assembly 
of Xist binding partners, suggesting an organizing principle for 
IncRNPs. 

RESULTS 

ChIRP-MS Method 

Extending on ChIRP-seq, a method using DNA oligonucleotides 
to capture IncRNAs and their genomic DNA binding sites (Chu 
et al., 2011), we optimized ChIRP-MS to identify IncRNA-associ- 
ated proteins (Figure 1A). We cross-link cells extensively with 
formaldehyde, retrieve target RNA with oligonucleotide hybridi- 
zation, and use a gentle biotin-elution to liberate associated 
proteins. The enriched proteins were identified by liquid chro- 



matography-tandem mass spectrometry 
(LC-MS/MS). We conducted negative 
controls by use of non-interacting con- 
trol probes, RNase treatment of lysate 
prior to ChIRP, or genetic removal of the 
target RNA. 

As a proof of principle, we performed 
ChIRP-MS of human U1 and U2 snRNAs 
in HeLa S3 cells. The snRNAs are ideal 
for validating ChIRP-MS because they 
are abundant (~1 million copies of U1 
per cell) (Gesteland and Atkins, 1993) 
and the spliceosome composition is well 
known (Pomeranz Krummel et al., 2009; 
Stark et al., 2001; Zhou et al., 2002). 
Furthermore, non-canonical roles of U1 
in preventing premature mRNA cleavage 
and polyadenylation have been recently 
reported (Almada et al., 2013; Berg 
et al., 2012; Kaida et al., 2010), implying 
potential novel interactors that ChIRP- 
MS may discover. We designed anti- 
sense DNA oligonucleotides targeting U1 and U2 snRNAs, 
respectively, in regions previously found to be accessible for 
morpholino binding, and as a negative control, we chose a 
non-targeting probe that does not bind any human RNA (Berg 
et al., 2012; Kaida et al., 2010). While the input RNA spread 
over a large size range (due to shearing by sonication) with 
distinct tRNA peaks, after ChIRP-enrichment, the two snRNAs 
predominated (Figure IB). U1 probe retrieved the known direct 
binding protein U1A, whereas the control probe did not. U2 
probe also enriched for U1 A, although the indirect interaction re- 
sulted in reduced enrichment. U2 probe also retrieved known 
U2-binding protein U2B, which cross-reacts with U1 A antibody 
due to their close homology (arrow. Figure 1C). ChIRP of U3, 
an abundant small nucleolar RNA not involved in splicing, specif- 
ically retrieved the nucleolar protein fibrillarin, but not U1A 
(Figure 1C). Beta-actin (ACTB) was not enriched by any probe. 
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serving as another negative control. These results indicate that 
ChIRP is specific even for very abundant RBPs. 

U1 and U2 ChIRP-MS Reveal Known and Novel 
Interactors 

We next scaled up experiments for MS-level analysis, including 
both RNase and non-targeting probe controls. Silver staining of 
ChIRP samples showed that U1 and U2 probes pulled down 
rich proteins from HeLa lysates, whereas all control samples 
are clean (Figure 1D), indicating that ChIRP-MS is highly spe- 
cific on the proteome level. U1 and U2 ChIRP-MS enriched 
(by >log23.5 or >1 0-fold, see Experimental Procedures) more 
than 400 proteins over respective negative controls (Figure 2A, 
full peptide count list in Table S1). The results were highly 
reproducible regardless of control strategies: for U1 , 98% over- 
lap between RNase and non-targeting probe controls; 99% 
for U2. The near-identical results from using two orthogonal 
methods for background removal highlight the robustness of 
the protocol. 

U1 and U2 snRNAs shared their RBPs extensively (309 in 
common, or 74% of U1 and 84% of U2-RBPs), as predicted 
from their common cellular function (Figure 2A). Both U1 and 
U2 strongly enriched for proteins involved in splicing and pre- 
mRNA biogenesis, as anticipated (Figure S1A). Together, the 
two snRNAs retrieved 79% of the human spliceosome compo- 
nents (Figure 2A) and 8 of 9 direct U1 binding proteins verified 
by crystal structure (Pomeranz Krummel et al., 2009; Stark 
et al., 2001; Ruepp et al., 2008). Analysis of known protein- 
protein interaction networks showed that the vast majority 
(96%) of all proteins identified were within two degrees of 
separations from the core spliceosome (Figure 2B) or the direct 
binding proteins of U1 (Figure S2A) (Pomeranz Krummel et al., 
2009; Zhang et al., 2012; Stark et al., 2001), suggesting that 
ChIRP-MS yields the immediate and most relevant protein 
network. Organization of U1/U2 interactomes into complexes 
based on curated protein interaction data confirmed extensive 
coverage of the spliceosome, SMN, and cap binding com- 
plexes (Figure 2C). 

U1 selectively enriched for the CSTF complex involved in 
pre-mRNA cleavage and polyadenylation, a recently described 
non-canonical function of U1 (Figures 2C and S2B; Gene 
Ontology (GO) term “RNA 3'-end processing” in Figure SI A) 
(Berg et al., 2012; Kaida et al., 2010). Immunoblots validated 
U1 -selective pull-down of CSTF2 over other snRNAs, which 
potentially explains this U1 -exclusive function (Figure SIB) 
(Berg et al., 2012) and shows that ChIRP is specific for proximal 
interactions even within the same complex (e.g., the spliceo- 
some). These and other protein complexes discovered repre- 
sent a wealth of information for the snRNP community (Figures 
20 and SI A). 



Xist Ribonucleoprotein Complex Purification 

We next turned to discover the protein partners of Xist. ChIRP- 
MS of Xist represents a substantial challenge in several ways: 
(1) Xist is far less abundant than U1 (<2,000 copies per cell 
versus 1 million) (Buzin et al., 1994), making it more relevant to 
other regulatory IncRNAs; (2) Xist transcript is long and will be 
sheared into fragments, requiring a tiling-probe strategy not 
necessary for the study of U1/U2; (3) Xist is chromatin and nu- 
clear matrix associated and therefore insoluble even by deter- 
gent and nuclease extraction (Clemson et al., 1996). Based on 
these considerations, we designed 43 probes against the mouse 
Xist RNA (Table S2). In a female mouse cell line (Neuro2a), we 
confirmed that Xist RNA was completely solubilized by sonicat- 
ion (data not shown), and over 60% of Xist RNA was selectively 
retrieved without enrichment of housekeeping Gapdh mRNA 
(Figure 3A). 

Xist probes retrieved rich protein analytes compared to the 
RNase control (Figure 3B). The most abundant proteins retrieved 
are HnrnpK and U, and M, the first two readily visualizable by 
Coomassie blue (Figure 3B). FInrnpU is required for the spread 
of Xist RNA across the chromosome in c/s (Hasegawa et al., 
201 0), thus a positive control. Xist-dependent retrieval of all three 
proteins was validated by ChIRP-western, proving that they 
are not retrieved by virtue of their sheer abundance; the control 
protein beta-actin was not enriched (Figure 3C). 

Stepwise and Developmentally Regulated Assembly of 
Xist RNP 

We carefully selected biological systems to perform Xist ChIRP- 
MS that represents different stages of Xist-mediated silencing 
(Figure 4A). Although Xist is expressed in most differentiated 
female cells, it is largely dispensable for the maintenance of 
XCI (Brown and Willard, 1994; Csankovszki et al., 1999). To 
ensure that we catch Xist “in action,” we chose a male mouse 
ESC line that has been genetically engineered to harbor a Xist 
cDNA knocked into chromosome 11 (chrll) that is inducible 
by doxycycline (dox) (Wutz and Jaenisch, 2000). The exogenous 
Xist localizes to chrl 1 and silences chrl 1 genes at a long dis- 
tance after 4 days of sustained expression and retinoic acid- 
induced differentiation (Wutz and Jaenisch, 2000) (Figure 4A, 
lanes 1 and 2). Turning on or off Xist transcription with dox 
creates an isogenically controlled experiment. Furthermore, the 
relatively rapid initiation of Xist silencing ensures synchronicity 
among cells, suppressing noise arising from population hetero- 
geneity. To study the endogenous Xist IncRNP, we performed 
parallel ChIRP-MS in an epiblast stem cell line (EpiSC) (Gillich 
et al., 2012). EpiSCs are derived from E5.5-E6.5 epiblasts and 
represent cells that have just undergone random XCI (occurring 
~E5.5) (Flayashi and Surani, 2009; Rastan, 1982; Takagi et al., 
1982) (Figure 4A, lane 3). Finally, we performed Xist ChIRP-MS 



Figure 2. U1/U2 ChIRP-MS 

(A) Venn diagram of known spliceosome proteins and proteins pulled down by U1 or U2. The number of interactions in each set is given after the set label. 

(B) Numbers of U1/U2 pulled-down proteins by their degrees of separation from known spliceosome proteins. The dashed line represents the distribution of a 
randomly simulated set of the same number of proteins pulled down by U1 and U2 (right axis). 

(C) Protein-protein and protein-RNA interaction network of U1/U2 pulled-down proteins. Proteins belonging to known complexes are organized and annotated in 
groups in top half of the plot, and proteins of unknown affiliation are presented at the bottom. Complexes and proteins more strongly enriched by U1 (left in graph) 
(e.g., Polyadenylation and cleavage, Nop56p) are positioned accordingly. 
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in trophoblast stem cells (TSCs), where the paternal X chromo- 
some is always silenced (Calabrese et al., 2012), a phenomenon 
termed imprinted XCI that contrasts with the random XCI in so- 
matic cells (Figure 4A, lane 4). RNase controls were performed 
side by side in the EpiSC and TSC experiments. 

We also compared Xist ChIRP-MS to ChIRP-MS of three 
abundant nuclear RNAs— U1, U2, and 7SK— to evaluate Xist- 
specific interactions (Experimental Procedures). 7SK is a snRNA 
present at~200,000 copies per cell and is involved in transcrip- 
tional elongation control. We ranked peptides enriched by each 
ncRNA and prioritized proteins that had Xist ChIRP-MS enrich- 
ment ranking at least 2-fold better than rankings in any of the 
three comparator ncRNAs. 

In total, we identified 81 Xist binding proteins from the four ex- 
periments (Figure 4A and full list of enriched proteins with pep- 
tide counts reported in Table S3). When compared to U1/U2/ 
7SK, only a minority of Xist hits (30/81) was also highly enriched 
by another ncRNA (rank ratio < 2, Table S4). These non-specific 
proteins are mainly involved in RNA processing (GO enrichment 
p = 8.4E-28) and may be involved in nuclear ncRNA splicing, nu- 
clear retention, or stability. They are likely bona-fide Xist binding 
proteins because they pass RNase and genetic controls, but 
they may not contribute to the specific gene regulatory function 
of Xist. We provide the list of non-specific proteins retrieved by 
all four nuclear ncRNAs as a resource for the field (in red. Table 
S4). In contrast, the Xist-specific proteins selectively enriched for 
gene repressors (GO enrichment p = 9.6E-8), which are high- 
lighted in Table S4 and discussed below. We also overlapped 
the set of proteins retrieved by Xist with those retrieved by 
two other abundant nuclear IncRNAs, NEAT1 and MAI.AT1, 
and found limited overlap (14 out of 81 shared by all three. 
Figure S2C) (West et al., 201 4). As expected, the majority of over- 



lapping proteins (8/14) are “nonspecific 
ChiRP hits” as defined above. 

HnrnpK ^'st ChIRP-MS in all four cell types 
retrieved a common set of proteins (62/ 
81 , 77%), termed Set 1 . An additional 19 
proteins interacted with Xist only in differ- 
entiated ESC, EpiSC, and TSCs; these 
proteins are termed Set 2. We describe 
the identity of proteins in these two sets 
and then discuss the dynamics of the in- 
teractions. Some of the binding proteins 
were known factors involved in XCI. We 
identified Rnf2 (also known as Ring2 or 
Ringib), the catalytic subunit of Poly- 
comb repressive complex 1 (PRC1) that 
deposits the repressive lysinel 1 9 monoubiquitination on histone 
H2A (H2AK1 19ub) over the inactive X chromosome (de Napoles 
et al., 2004; Fang et al., 2004). Other PRC1 components identi- 
fied included Pcgf5 and Rybp (both in set 1); Rybp is a stoichio- 
metric component of PRC1 that has been shown to accumulate 
on the Xi independently of PRC2 (Tavares et al., 2012). We also 
found the Sin3-HDAC1 components Spen, Sap18, and 
Mybbpla, which are repressive transcriptional factors that re- 
cruit histone deacetylase (FIDAC) complexes. Flistone deacety- 
lation correlates with reduced gene expression and is another 
hallmark of the inactive X chromosome (Keohane et al., 1996). 
The co-purification of these proteins may bridge the biochemical 
gap between Xist and HDAC that remains little explored in the 
field. Xist ChIRP-MS also recovered nuclear matrix proteins 
HnrnpU, Matrin 3, and Safb, consistent with the observation 
that Xist is probably anchored by nuclear matrix (Clemson 
et al., 1996). Notably, HnrnpU is required for Xist localization (Ha- 
segawa et al., 201 0). Finally, RBPs such as HnrnpK strongly and 
specifically interacted with Xist; HnrnpK was not retrieved by U1 
or U2. Collectively, the two sets of proteins represent candidate 
factors that could play roles in Xist localization or function. 

Comparison of Xist interactors in the four cell types revealed 
a potential step-wise assembly of Xist binding proteins from 
the pluripotent state to differentiation. Unsupervised hierarchical 
analysis showed that the Xist interactors are distinct in ESCs, 
whereas the differentiated ESC, EpiSC, and TSC shared a signif- 
icant degree of overlap (Figure 4A). Whereas Set 1 proteins 
remain associated with Xist from pluripotency to differentiation, 
Xist interaction with Set 2 proteins is observed only upon differ- 
entiation. Xist interacted with both Set 1 and Set 2 proteins in the 
latter three cell types; 77 of 81 Xist interactors (95%) were 
independently retrieved in these differentiated cells. The HDAC 



408 Cell 161 , 404-416, April 9, 2015 ©2015 Elsevier Inc. 






Cell 



ESC ^ Diff 


EpiSC TSC 


Cell type 


ESC 




Transgenic 

(Chrll) 


Endogenous 

(ChrX) 


Xist locus 


Modest 


Strong 


Long-range 

silencing 



B 



Imprinted 



Rnf20 




cn 

Q. 



X 

o 

(/) 



2 - 



r = 0.67 



. , . 



Figure 4. Xist Partner Proteins Are Develop- 
mentally Regulated 

(A) Heatmap of Xist-RBPs pulled down in the four 
experiments. Color bars indicate abundance of 
peptides detected. Protein annotations were color 
designated based on their class. 

(B and C) (B) Similar proteins are enriched between 
differentiating ES cells versus EpiSCs and (C) 
between EpiSCs and TSCs. 
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complex subunit Spen straddles these categories because it in- 
teracts with Xist in ESC, but the interaction intensifies with differ- 
entiation (asterisk in Figure 4A). The distinction between Set 1 
and 2 is unlikely due to lower efficiency of Xist ChIRP-MS in 
ESCs because the quantitative signal for Set1 proteins in ESC 
is on par with that in differentiated cells. While the Set 1 proteins 
may represent the ground state of Xist-interactome that pre- 
pares the IncRNA for action, the differentiation-coupled Xist in- 
teractors include intriguing chromatin-modifying proteins such 
as Spen, Rnf20, Mybbpla, and Sap18. These may represent 
additional silencing factors recruited to Xist RNA when XCI is in 
full action. Quantitative comparison between Xist ChIRP-MS in 
differentiated ESC versus EpiSC or versus TSC showed that 
they are largely similar, especially for the strong interactors (r = 
0.67 and 0.85, respectively. Figures 4B and 4C). These results 
suggest that (1) transgenic Xist indeed phenocopies the endog- 
enous RNA and shares similar binding proteins; (2) ChIRP-MS is 
robust and gives consistent results in multiple systems; (3) 
random XCI and imprinted XCI appear to employ nearly identical 
Xist-associated proteins, and therefore, extraembryonic troph- 
oectoderm likely executes silencing in ways that are highly 
similar to that of the embryo proper. 

HrnpK Participates in Xist-Mediated Gene Silencing 

To assess the functional importance of Xist-interacting proteins 
in gene silencing, we tested their dispensability in Xist-mediated 
silencing of the imprinted Grb10/Meg1 gene, previously shown 
to be silenced by Xist upon differentiation of ESC (Wutz and Jae- 
nisch, 2000). The imprinted Grb10 gene is located 41 megabases 
away from the Xist transgene on chrl 1 and is thought to be mono- 
allelically expressed from the chrl 1 harboring the transgene (Fig- 
ure 5A) (Wutz and Jaenisch, 2000). We showed it indeed was 
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(iog 2 peptide counts) silenced by transgenic Xist (Figure 5B). 

We chose to first target HnrnpK, M, and 
U because they represent some of the 
most enriched Set 1 proteins (especially K) 
and because this simple heuristic identifies 
HnrnpU, a known key mediator of Xist 
function. Upon siRNA-mediated depletion 
(Figure 5C), only HnrnpU and HnrnpK had 
significant effects on Grb10 silencing (Fig- 
ure 5D). We ruled out off-target effects 
by showing that all four individual siRNAs 
against HnrnpK produced the same de- 
repression effect (Figure S4A). We directly 
visualized transcription from the Xist- 
silenced allele using two-color RNA-FISH 
(Figures 5E and 5F). We used a genomic (BAC) probe, allowing 
us to detect the Grb10 nascent transcript rather than its mature 
mRNA. In this way, we scored for the presence or absence of 
Grb10 transcription adjacent to the X/sf-coated chrll. HnrnpK 
depletion significantly increased the frequency of active Grb10 
allele found close to or within the Xist RNA-coated chromosome 
1 1 , indicating that, indeed, it is less sensitive to Xist-mediated 
silencing in the absence of HnrnpK (Figures 5E and 5F). 

We also tested the requirement for HnrnpK in endogenous XCI 
in EpiSC. We converted ESC into EpiSC in the presence of Fgf2 
and Activin (Guo et al., 2009). EpiSC conversion was confirmed 
by morphologic changes, marker expression, induction of Xist 
expression, and Xist localization to the Xi (Figures S3A-S3D). 
We performed single-molecule fluorescent in situ hybridization 
(sm-FISH) on Usp9x, an X-linked gene that is subject to random 
X-inactivation. We used FISH probes against the introns of 
Usp9x gene to exclusively detect its pre-mRNA that indicates 
active transcription. Although only 10% of the cells show two 
Usp9x pinpoints in control cells, HnrnpK- or HnrnpU-depleted 
cells showed a 2- to 3-fold increase in cells with two Usp9x 
FISH signals (Figures S3E and S3F). The reduction in successful 
XCI for HnrnpU depletion matched observations from a prior 
study (Figure S3F) (Hasegawa et al., 2010). We conclude that 
HnrnpK is an important factor for Xist-mediated silencing. 

HnrnpK Contributes to Xist- Med fated Chromatin 
Modifications, but Not Xist Biogenesis or Localization 

We tested potential roles of HnrnpK early in the sequence of 
repressive events, including Xist biogenesis, localization, and 
spreading or chromatin silencing. Northern blot analysis showed 
that Xist abundance or splicing were not impacted by depletion 
of HnrnpK, U, or M (the two minor isoforms upon HnrnpU 
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depletion are consistent with previous report) (Hasegawa et al., 
2010) (Figure 6A), although we cannot exclude that minor 
changes occurred given that Xist is present in multiple isoforms. 
Next, sm-FISH confirmed that HnrnpU depletion indeed delocal- 
ized Xist, but HnrnpK depletion did not (Figure 6B). Combined 
immunofluorescence and RNA FISH (IF-coFISH) showed that, 
while Xist RNA colocalized with H2AK119ub and H3K27me3, 
HnrnpK depletion significantly reduced the accumulation of 
H2AK119ub and H3K27me3 on the Xi without affecting Xist 
RNA localization (Figures 6C and 6D). HnrnpK depletion did 
not affect the global level of H3K27me3, showing that HnrnpK 
has a specific impact on Xist-mediated recruitment of repressive 
chromatin marks (Figure S4B). Given that both H3K27me3 and 
H2AK119ub modifications are among the earliest epigenetic 
changes occurring to the Xi, the results are consistent with our 



Figure 5. Functional Characterization of 
Xist RBPs 

(A) Relative positions of Grb10 and Xist transgene 
(TG) on chr11. 

(B) Induction of Xist and repression of Grb10 
by different doses of dox in e36 cells that have 
undergone RA-induced differentiation for 4 days. 

(C) Western validation of HnrnpU, K, and M 
knockdown by siRNAs. 

(D) De-repression of Grb10 upon depletion of 
HnrnpU, K, and M. 

(E) Dual-color FISH of Grb10 and Xist in e36 cells 
that are depleted of HnrnpK. Arrowheads indicate 
Grb10 allele escaping Xist silencing. 

(F) Quantification of cells with Grb1 0 expression on 
the Xist-coated chromosome by counting >150 
cells from 3 replicates. 



hypothesis that HnrnpK is a novel regu- 
lator of the initiation of X-inactivation. 
Xist ChIRP retrieved multiple PRC1 sub- 
units, and PRC1 or PRC2 action can 
mutually recruit each other (Blackledge 
et al., 2014; Cooper et al., 2014; Kalb 
et al., 2014). Indeed, HnrnpK depletion 
also spatially dissociated Xist from the 
PRC2 subunit Eed (Figure S5). HnrnpK 
contains three RNA-binding KH domains 
that may directly bind Xist. UV-cross-link- 
ing RNP immunoprecipitation followed by 
RT-PCR (CLIP-qRT-PCR) showed that 
HnrnpK directly bound Xist RNA, with 
the strongest interaction mapping down- 
stream of repeat F in exon 1 (Figure S4C). 
HnrnpK retrieved Xist more efficiently in 
CLIP than HnrnpU, a known direct inter- 
action that we reproduced (Hasegawa 
et al., 2010). 



The A-Repeat of Xist Interacts with 
siHnrnpK Spen to Mediate Gene Silencing 

We next explored the use of ChIRP-MS to 
dissect domain-specific interactions of 
Xist with its partner proteins. A small 0.9 kb region on the very 
5' end of Xist that harbors the conserved A-repeat element is 
required for transcriptional silencing, but not for chromatin inter- 
action or spreading across the X chromosome (da Rocha et al., 
2014; Wutz et al., 2002). In principle, deletion of A-repeat may 
alter RNA folding or modification to abrogate interaction of 
most of the silencing proteins; alternatively, the A-repeat may 
be selectively required for the interaction of a small number of 
key silencing factors. ChIRP-MS appears to be an ideal 
approach to distinguish between these models. Xist ChIRP-MS 
of ES cells harboring inducible full-length Xist or A-repeat mutant 
at the endogenous locus on the X chromosome (Wutz et al., 
2002) revealed that most protein interactions were not affected 
by the deletion, but three proteins— Spen, Rnf20, and Wtap— 
were completely unable to bind the mutant (Figure 7 A). Notably, 
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Figure 6. HnrnpK Is Required for Repressive Chromatin Modifications of Inactive X 

(A) Northern blot against Xist in e36 cells depleted of HnrnpM, U, or K. 

(B) Xist sm-FISH in HnrnpU and K knockdown cells. 

(C and D) IF co-FISH of Xist and H3K27Me3 (C)/H2AK119ub (D) in HnrnpK knockdown cells. Number of cells with strong, weak and undetectable repressive 
marks overlapping with Xist foci were tallied and represented below. 



Spen interaction with Xist is increased upon ESC differentiation, 
and Rnf20 and Wtap both belong to Set 2 proteins that interact 
with Xist only upon differentiation (Figure 4A). Thus, the A-repeat 
appears to be a focus of the differentiation-coupled assembly of 
Xist RNP. The exclusive binding of these three proteins to full- 
length Xist, but not the A-repeat mutant, was confirmed by 
ChIRP-western (Figure S6A). This result also implied that HnrnpK 



binding does not require A-repeat, which we confirmed by 
ChIRP-western (Figure S6A). Thus, two sets of silencing proteins 
bind to different domains of Xist. 

We reasoned that one or more of the A-repeat binding factors 
may be required for XCI. RNAi depletion in ES cells harboring 
wild-type Xist of each of these proteins, as well as Rnf40, a func- 
tional partner of Rnf20, showed that only depletion of Spen, but 
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Figure 7. Xist A-Repeat Binds Spen, a Silencing Factor that Contributes to XCI 

(A) Similar proteins are enriched by ChIRP-MS of full-length Xist and A-repeat mutant, except three highlighted proteins: Wtap, Rnf20, and Spen. 

(B) siRNA depletion of the indicated factors show that only Spen is required for X-linked silencing of Pgk1 . 



(legend continued on next page) 
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not the other proteins, dramatically reduced Xist-mediated 
silencing of the X-linked gene Pgk1 (Figure 7B). Rnf20 and 
Rnf40 depletion actually slightly increased Pgk1 silencing, which 
is consistent with their known roles in enhancing transcription 
(Figure 7B) (Zhu et al., 2005). In addition, by two-color RNA- 
FISH, we found that Spen depletion results in reduced silencing 
of the X-linked genes Mecp2 and Rnf12, with more frequent 
detection of nascent transcription from the Xist-coated inactive 
X chromosome in Spen-depleted cells compared to control cells 
or to cells depleted for Rnf20, Rnf40, or Wtap (Figures 7C and 
7D). This experiment further illustrates that Spen is not appar- 
ently required for Xist RNA accumulation or spreading across 
the Xi but is specifically needed for transcriptional silencing, 
which is consistent with its specific association with the A-repeat 
region of Xist. Furthermore, we validated the requirement of 
Spen for silencing of Grb10 in ESCs, where an Xist transgene 
is ectopically expressed on chr11 (Figure S6B). Collectively, 
these results suggest that Spen could be a functional mediator 
of Xist-RNA-driven gene silencing. 

Spen is the mouse homolog of Drosophila homeotic mutant 
Split ends and encodes a transcriptional repressor (Arieti et al., 
2014; Shi et al., 2001). Spen contains at least three RNA recog- 
nition motifs (RRMs) that can bind the IncRNA SRA to mediate 
RNA-directed transcriptional regulation (Arieti et al., 2014; Shi 
et al., 2001). Several existing Spen antibodies tested were not 
suitable for UV CLIP. Instead, we generated recombinant 
Spen RRM domains by in vitro translation and found that two 
or three of the Spen RRMs preferentially retrieved with Xist 
A-repeat over GFP mRNA in vitro (Figure S6C). These results 
suggest that Spen RRM domains may interact directly with the 
Xist A-repeat region. 

DISCUSSION 

ChIRP-MS: An RNA-Centric Interactome Technology 

ChIRP-MS provides a potentially universal interactome discov- 
ery strategy that can be readily applied to any RNA of interest. 
We found comparable results from RNase-treated samples or 
isogenic cells that lack the target RNA, suggesting application 
in non-genetic systems. The use of different cross-linking re- 
agents allows the investigator to potentially tune the degree of 
interactions captured from the target RNA. The thorough re-dis- 
covery of the spliceosome complex proteins by ChIRP-MS of U1 
and U2 snRNPs validates the robustness of ChIRP-MS. In 
addition, the novel factors found in U1 , U2, and Xist RBPs (func- 
tionally validated in the latter) demonstrate the added sensitivity 
of ChIRP-MS over traditional methods of RBP identification and 
provide a rich resource for future investigations. For example, 
U1 -specific interaction with the cleavage stimulation and polya- 
denylation proteins has direct implications for “telescripting,” 
a critical process of U1 -mediated protection from premature 
mRNA shortening (Berg et al., 201 2; Kaida et al., 201 0). 



Dynamic Plug-and-Play of Xist Binding Proteins 

Our analysis revealed two sets of proteins that interact with Xist 
in a developmentally regulated manner. As Xist expression and 
reversibility of Xist-mediated gene silencing are tightly coupled 
to ESC differentiation, Xist may gain new silencing functions, 
perhaps through newly acquired or strengthened protein interac- 
tions, upon exit from pluripotency. Consistent with this idea, “Set 
2” proteins bind Xist exclusively in differentiating ESCs (and 
EpiSCs and TSCs); this developmentally controlled assembly 
of Xist RNP provides a fail-safe backup for premature Xist 
expression during pluripotency. The expression of most factors 
in Set 2 remains stable throughout the differentiation of mESC 
into mEpiSC (< 2-fold change), as measured by whole-nucleus 
proteomic analysis (Song et al., 2012) (Figure S6D). Thus, the 
vast majority of Set 2 interactions are most parsimoniously ex- 
plained by a change in Xist RNA that now allows interaction 
with a pre-existing set of proteins. In contrast, the compositions 
of Xist-RBPs are strikingly similar in differentiating ES cells, 
EpiSCs, and TSCs. TSCs are derived from extra-embryonic tro- 
phectoderm cells, where the inactive X is always paternal (Takagi 
and Sasaki, 1975). It remains a standing debate in the field 
whether imprinted XCI differs from random XCI merely by a 
simple choice mechanism while sharing the same silencing 
machinery or whether the imprinted versus random XCI are 
fundamentally different. Our observations support the former 
hypothesis and suggest that the difference between random 
versus imprinted XCI is focused on the choice mechanism of 
the future Xi. 

HnrnpU and HnrnpK emerged as the most enriched Xist-asso- 
ciated factors, and both functionally contribute to XCI. Although 
HnrnpU is required for Xist spreading across X chromosome, 
HnrnpK knockdown affects Xist-directed deposition of silencing 
histone modifications H2AK1 19ub and H3K27me3, the products 
of PRC1 and PRC2 complexes, respectively. Xist appears to 
directly bind PRC1 , but not PRC2. This is consistent with recent 
reports demonstrating the PRC1 -dependent recruitment of 
PRC2 complex (Blackledge et al., 2014; Cooper et al., 2014; 
Kalb et al., 2014). It has been reported that PRC2 binds specif- 
ically to the repeat A (repA) transcript of Xist, which is produced 
as a separate and shorter RNA (1 .6 kb, including the A-repeat 
region) (Zhao et al., 2008), although the exact function of this 
shorter transcript remains unclear. One explanation for our find- 
ings could be the existence of different RNA isoforms with 
different functions. Further tests will be required to dissect the 
events by which Polycomb proteins associate with Xi. 

Modular Xist RNA Domains Link Spen- and 
HnrnpK-Mediated Silencing 

Although the A-repeat was proposed to recruit PRC2 complex 
(Zhao et al., 2008), PRC2 itself is dispensable for the initiation 
of gene silencing during XCI (Kalantry and Magnuson, 2006). 
Furthermore, in the A-repeat deletion Xist mutant, PRC2 and 



(C) siRNA depletion of Spen interferes with XCI in cells, as indicated by co-localization of Xist “cloud” and active transcription of X-linked genes Rnf1 2 and MeCP2 
(arrowheads) on the same chromosome. 

(D) Quantification of cells with expression of Mecp2 and Rnf12 on the Xist-coated chromosome by counting >100 cells from 3 replicates. A proportion of cells 
does not upregulate Xist and does not coat (around 40%); we counted only the cells with Xist domains. 

(E) Model of the cell-state- and scaffold-specific loading of Xist-RBPs and their chromatin-modifying functions. 
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H3K27me3 are still recruited to the Xist-coated chromosome (da 
Rocha et al., 2014; Plath et al., 2003). Imaging studies suggested 
that Xist RNA create a transcriptionally inactive nuclear compart- 
ment, independent of the A-repeat, but that the A-repeat is 
required for the movement of genes into this compartment as 
they become silenced (Chaumeil et al., 2006). These 
observations suggest that factors beyond PRC2 are at play. 

Our results revealed the A-repeat— essential forXist-mediated 
gene silencing (Wutz et al., 2002)— as a key element for the 
developmentally regulated binding of several proteins. The se- 
lective abrogation of three protein interactions but full preserva- 
tion of all others by the A-repeat deletion highlights the modular 
organization of Xist. We found Spen, a potent transcriptional 
repressor, to be important for Xist-mediated silencing. Spen 
interaction with Xist is increased upon differentiation, suggesting 
a gain of Spen-associated silencing activity to the Xist RNP. The 
Spen knockout is embryonic lethal at El 2.5 (Kuroda et al., 2003), 
which is later than expected if XCI is fully defective. However, the 
knockout was not performed with a maternal germline depletion 
of the protein, so an earlier phenotype masked by the maternal 
pool cannot be ruled out. On the other hand, Spen may well 
collaborate with other Xist-recruited silencing activities, and 
there may also be potential redundancy with two other mamma- 
lian Spit ends homologs. 

The reported association between Spen and MBD3-NuRD 
complex nominates several gene-silencing pathways, including 
ATP-dependent nucleosome remodeling, histone deacetylation 
via HDACs, and modulation of DMA methylation (Shi et al., 
2001; Zhang et al., 1999). NuRD complex decommissions ESC 
enhancers to enable differentiation and lineage commitment— 
the same developmental window where XCI takes place (Rey- 
nolds et al., 201 2; Whyte et al., 201 2). It is conceptually appealing 
that the same silencing mechanism that turns off pluripotency 
regulators may both enable Xist expression (by removing 
repression of Xist) and endow Xist with the silencing power to 
achieve XCI. Intriguingly, Spen interacts with Mbd3 (Shi et al., 
2001 ); NuRD recruitment to active enhancers is believed to occur 
through Mbd3 recognition of 5-hydroxymethylcytosine (Yildirim 
et al., 2011). NuRD-mediated deacetylation of H3K27ac also 
permits PRC2-mediated H3K27me3 and gene silencing (Rey- 
nolds et al., 2012). Thus, the combination of NuRD and Poly- 
comb activity can turn an active gene into an inactive one. We 
propose that Xist may serve as a physical scaffold for organizing 
at least two chromatin modification activities— a writer to 
deposit silencing marks via PRC1 and an eraser to remove active 
marks via Spen and associated factors— that, together, coordi- 
nately enforce permanent epigenetic silencing (Figure 7E). 

Although the other two A-repeat associating factors do not 
directly impact XCI in our limited analysis, they could conceptu- 
ally still contribute to XCI. Rnf20 is the E3 ubiquitin ligase for 
H2BK1 20ub1 , a histone modification that marks the gene bodies 
of transcriptionally active genes (Zhu et al., 2005). Xist has been 
proposed to preferentially target actively transcribed genes on X 
chromosome, exploiting the spatial proximity of actively tran- 
scribed loci to efficiently target Xist-associated silencing factors 
(Engreitz et al., 2013; Simon et al., 2013). Furthermore, the 
A-repeat mutant of Xist shows reduced binding to such active 
regions (Engreitz et al., 2013), which may be explained by the 



inability of the Xist A-repeat mutant to seek out Rnf20 complex 
loaded on active loci. Finally, Wtap is involved in the installation 
of the N6-methyladenosin (m6A) on RNAs. Wtap binding to 
the A-repeat of Xist is consistent with the presence of m6A in 
the same region of the RNA (data not shown). The functional 
impact of Wtap binding or m6A modification remains to be 
understood but represents an exciting perspective given the 
strategic importance of the domain in question. Our results set 
the stage for future structure-function analysis of Xist and its in- 
teracting proteins as a paradigm to understand functional motifs 
in IncRNAs. 

EXPERtMENTAL PROCEDURES 
ChIRP-MS 

1 0-20 1 5 cm dishes of cells were used per ChIRP-MS experiment (1 00 million- 
500 million cells, depending on the cell type). Cell harvesting, lysis, disruption, 
and ChIRP were essentially performed as previously described (Chu et al., 
2012), with the following modifications: (1) cells are cross-linked in 3% formal- 
dehyde for 30 min, followed by 0.1 25 M glycine quenching for 5 min; (2) hybrid- 
ization can be started late in the day and left running overnight to reduce 
hands-on time; (3) for MS experiments, lysates were pre-cleared by incubating 
with 30 1^1 washed beads per ml of lysate at 37°C for 30 min with shaking (prior 
to hybridization, beads were removed twice from lysate using a magnetic 
stand); (4) for RNase control, lysates are pooled first and aliquoted into two 
equal amounts. 1/1,000 volume of 10 mg/ml Rnase A (Sigma) is added to 
the RNase control sample, and both control and non-treated samples are incu- 
bated at 37°C for 30 min with mixing prior to hybridization steps. This can be 
done concurrently with pre-clearing. RNA extraction can be performed from a 
small aliquot of post-ChIRP beads as described (Chu et al., 2012). For protein 
elution, beads were collected on magnetic stand, resuspended in biotin elution 
buffer (12.5 mM biotin [Invitrogen], 7.5 mM FIEPES [pFI 7.5], 75 mM NaCI, 
1.5 mM EDTA, 0.15% SDS, 0.075% sarkosyl, and 0.02% Na-Deoxycholate), 
mixed at room temperature (r.t.) for 20 min and at 65°C for 10 min. Eluent 
was transferred to a fresh tube, and beads were eluted again. The two eluents 
were pooled, and residual beads were removed again using the magnetic 
stand. 25% total volume TCA was added to the clean eluent, and after thor- 
ough mixing, proteins were precipitated at 4°C overnight. The next day, pro- 
teins were pelleted at 16, 000 ref at 4°C for 30 min. Supernatant was carefully 
removed from the belly side of tubes, and protein pellets on the spine of tubes 
(sometimes invisible at this step) were washed once with cold acetone and pel- 
leted again at 16,000 ref at 4°C for 5 min, and acetone was removed. Pellets 
(much more visible now) were briefly centrifuged again and, after removal of 
residual acetone, were left to air-dry for 1 min on bench-top. Proteins are 
then immediately solubilized in desired volumes of 1 x laemmli sample buffer 
(Invitrogen) and boiled at 95°C for 30 min with occasional mixing for reverse- 
crosslinking. Final protein samples were size-separated in bis-tris SDS- 
PAGE gels (Invitrogen) for western blots or MS. See Extended Experimental 
Procedures and Table S2 for ChIRP probe design. 

Defining Proteins Identified by ChIRP-MS 

Potential MS artifacts were first filtered by removing low-confidence protein 
hits with fewer than 9 peptides from a single gel-C slice and fewer than 1 6 total 
peptides (a simpler cut-off of >10 peptides from any single gel-C slice was 
used for U1/U2). Thereafter, a stringent cut-off of log2 > 3.5 between 
experiment and control (>11.3 fold enrichment) is applied to eliminate RNA- 
independent background interactions. Specific hits of 7SK ChIRP-MS will be 
reported elsewhere. To define specific versus non-specific components of 
the Xist IncRNP, ChIRP-MS hits from Xist (differentiated ESC), U1, U2, and 
7SK were first ranked based on peptide abundance. Xist-specific interactors 
are defined as proteins with Xist ChIRP-MS rank at least twice better than in 
ChIRP-MS of U1, U2, and 7SK. Non-specific interactors are proteins that 
show rank ratio < 2 in Xist ChIRP versus U1 , U2, or 7SK. For the purpose of 
comparison, mouse protein names of 7SK and Xist hits were replaced with 
their human counterparts (no ambiguity). 
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Defining Xist-Specific RBPs versus Promiscuous RBPs 

The most enriched protein (most peptide counts in experiment) is ranked 1 , the 
second most enriched is ranked 2, and so forth. “Specific interactors” for Xist 
are defined as proteins that have a rank that is at ieast 2-foid better than in aii 
three other ChiRP-MS of U1 , U2, or 7SK. 

Knockdown Studies 

siRNAs and shRNAs are purchased from Dharmacon and invitrogen. Trans- 
fection was performed with nucieofector or RNAiMAX. See Extended Experi- 
mental Procedures and Table S5 for full details. 

Microscopy 

Xist-FISH, Usp9x-FISH, and co-IF are performed with sm-FISH probes with 
standard protocol. All other dual-color FISH were essentially performed as 
previously described (Chaumeil et al., 2008). See Extended Experimental 
Procedures for full protocols and the list of reagents used. 

RNA Crossiinking IP and Interaction Studies 

Clip-qRTPCR was essentially performed as described (Flynn et al., 2015) 
and triple flag-tagged codon-optimized 2x RRM and 3x RRM Spen frag- 
ments were used in in vitro interaction studies. See Extended Experimental 
Procedures for full details. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, six 
figures, and five tables and can be found with this article online at http://dx. 
doi.org/1 0. 1 01 6/j.cell.201 5.03.025. 
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Our study reported that miR-31 is a regulator of multiple mRNAs important for different aspects of breast cancer metastasis. We 
recently identified concerns with several figure panels in which original data were compiled from different replicate experiments 
in order to assemble the presented figure. The scope of the figure preparation issues includes compiling data from independent 
experiments to present them as one internally controlled experiment, statistical analyses based on technical replicates that are 
not reflective of the biological replicates, and comparisons of selectively chosen data points from multiple experiments. As many 
of the published figures are therefore not appropriate or accurate representations of the original data, we believe that the responsible 
course of action is to retract the paper. We apologize for any inconvenience we have caused. 
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More than 50 years ago, Jacob et al. (1964) proposed an elegant model for the regulation of DNA replication in bacteria. In the replicon model, the fundamental unit of DNA 
replication, the replicon, would be governed by a c/s-acting replicator sequence and a frans-activating initiator factor. Despite the increased size and complexity of eukaryotic 
genomes, eukaryotic DNA replication continues to be guided by the fundamental principles and concepts established in the replicon model. Eukaryotic origins of replication 
(replicators) are defined by c/s-acting sequences or structural DNA elements that are recognized by functionally conserved frans-acting initiator factors (ORC, Cdc6, Cdtl, and 
Mcm2-7). 

Prokaryotic Replication Origins 

E. coli has a single circular chromosome that is ~4.6 megabases in length, containing a single origin of replication (or/C). The two major c/s-acting features of the -250 bp or/C 
are an AT-rich DNA unwinding element (DUE) and multiple 9 bp DnaA-binding motifs (Skarstad and Katayama, 2013). 

DnaA is a AAA+ ATPase that recognizes both high- and low-affinity binding sites throughout or/C. High-affinity DnaA binding sites (R1 R4) are occupied throughout most of 
the cell cycle; however, low-affinity binding sites become occupied only at replication initiation. Two accessory proteins, Fis and IHF, repress or stimulate initiation of DNA replica- 
tion, respectively, by altering the conformation of DNA surrounding or/C. Binding of a full complement of DnaA molecules (10 20) leads to DNA unwinding at DUE. This unwinding 
stimulates DnaC, a AAA+ ATPase helicase loader, to assemble two homohexamers of the DnaB helicase complex on the newly melted DNA, forming the pre-replicative complex 
(pre-RC) (Costa et al., 2013). 

Eukaryotic DNA Replication Origins 

A consequence of the large size of eukaryotic genomes is that multiple origins of replication must be utilized in parallel to facilitate the complete duplication of the genome 
within the confines of S phase (Gilbert, 2004). There are -350 origins of replication distributed throughout the S. cerevisiae genome. In contrast, there are an estimated 40,000 
80,000 origins distributed throughout the much larger human genome. 

As in bacteria, both c/s- and frans-acting factors define start sites of eukaryotic DNA replication. Eukaryotic c/s-acting replicator elements were first identified in the model 
organism S. cerevisiae as autonomous replicating sequence (ARS) elements of -200 bp. Each ARS element contains a conserved ARS consensus sequence (ACS), which, at 
its core, is an 11 bp T-rich motif that is necessary but not sufficient for origin function. In addition to the ACS, there are several other poorly defined B sequence elements that 
contribute to helicase loading and DNA unwinding. In contrast, conserved replicator sequences that direct origin selection in higher eukaryotes have remained elusive. Recently, 
low-complexity GC-rich sequences that are able to form G-quadruplexes have been identified as potential replicator elements, suggesting that DNA secondary structure may 
play a role in origin licensing in higher eukaryotes (Cayrou et al., 2012). 

There is remarkable functional conservation between prokaryotic and eukaryotic frans-acting initiator factors. Analogous to DnaA, the origin recognition complex (ORC) 
recognizes and binds to replicator origin sequences throughout the majority of the cell cycle. ORC is composed of six subunits, five of which are AAA+ ATPases. In G1 of the cell 
cycle, another initiation factor and AAA+ ATPase, Cdc6, associates with ORC and coordinates the loading of a double hexamer of the minichromosome maintenance (Mcm2 7) 
helicase complex via an interaction with Cdtl to form the pre-RC (Bell and Kaguni, 2013). Formation of the pre-RC “licenses” the origin for potential activation in the subsequent 
S phase. 

Eukaryotic c/s-acting replicator elements are necessary but not sufficient for origin activity, as there are many more ACS motif matches and potential G-quadruplexes than 
utilized origins of replication. Epigenetic features, including promoters, CpG islands, nucleosome organization, and post-translational modification of histones, also impact the 
selection and activation of eukaryotic replication origins (Ding and MacAlpine, 2011). A nucleosome-free region (NFR) with well-positioned flanking nucleosomes is a conserved 
feature of eukaryotic replicator sequences, and maintenance of the nucleosome free region at the origin is critical for function. Eukaryotic origins are typically associated with 
intergenic sequences. In higher eukaryotes, origins and ORC binding sites are frequently found in the NFR associated with transcription start sites and CpG islands, whereas in 
S. cerevisiae, ORC is excluded from transcription start sites. 

A diverse array of histone PTMs are correlated with ORC binding and origin function; however, only a handful of epigenetic marks have been mechanistically linked to specific 
steps in origin selection and activation (Dorn and Cook, 2011; Mechali et al., 2013). The methylation state of histone H4 lysine 20 in metazoans has been linked to ORC bind- 
ing and pre-RC assembly. Dimethylation of H4K20 by the methyltransferase Suv4-20h1/2 is recognized by the bromo-adjacent homology (BAH) domain of ORC1. In addition, 
monomethylation of H4K20 by the cell-cycle-regulated methyltransferase, PR-Set7, promotes pre-RC assembly. The histone acetyltransferase (HAT) Hbol interacts with ORC, 
Cdtl, and Mcm2 and is required for efficient pre-RC assembly. Presumably, Hbol acetylates origin-proximal nucleosomes on histone H4to promote pre-RC assembly; however, 
it remains possible that the target of Hbol is not histones but instead specific pre-RC components. Finally, although numerous epigenetic modifications and chromatin states 
have been correlated with origin function, it is important to stress that correlation does not equal causation. The S. cerevisiae histone deacetylase Rpd3 represses global origin 
activation not by deacetylating histone H3 in the vicinity of replication origins but rather by regulating silencing of the origin-rich multicopy rDNA locus, which serves as a sink 
for sequestering key replication initiation factors (Yoshida et al., 2014). 
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